INQUIRING LINE

How do agents automatically generate suitable learning tasks based on current capability?

This explores how agents propose their own next learning tasks—an automatic curriculum that stays calibrated to what the agent can currently do, rather than tasks a human curated in advance.


This explores how agents propose their own next learning tasks—an automatic curriculum tuned to current ability instead of a fixed human-written syllabus. The corpus frames the problem first by showing why a fixed syllabus fails: when agents train only on static expert demonstrations, their competence is capped by what the dataset's curators happened to imagine, and they never face the situations their own growing skill makes reachable Can agents learn beyond what their training data shows?. A capability-matched curriculum is the escape: the agent should keep being handed tasks just past its current edge.

The clearest worked example is VOYAGER, which pairs an externalized skill library with an *automatic curriculum* that proposes new exploration goals based on what the agent has already mastered and what its environment now affords. Because skills are stored and composed rather than baked into weights, each newly acquired skill widens the set of tasks the curriculum can sensibly suggest next—learning and task-generation feed each other Can agents learn new skills without forgetting old ones?. The interesting move is that 'suitable difficulty' isn't measured abstractly; it emerges from the current contents of the skill library plus environmental feedback about what just succeeded or failed.

A second route to capability-matched tasks is to let the agent's own activity generate them. Every action an agent takes produces a next-state signal—a tool's output, an error message, a changed screen, a user reply—and these signals are themselves a live stream of learning opportunities calibrated to exactly what the agent is doing right now Can agent deployment itself generate training signals automatically?. Related work makes that loop tighter still: creating a skill *inside* the reasoning loop, at the moment a gap appears, grounds the new task in exact runtime context rather than an offline guess about what might be useful Does creating skills inside the agent loop eliminate mismatches?. Reflexion shows the minimal version—after a success/failure signal, the agent writes itself a diagnosis that effectively sets up its next attempt Can agents learn from failure without updating their weights?.

Here's the turn you might not expect: the most capable version of this may not be the agent grading itself. SkillOS separates a *trainable curator* from a frozen executor, and the curator learns to evolve the skill repository toward genuinely strategic meta-skills rather than the verbose, generic additions a self-curating agent tends to accumulate Can a separate trained curator improve skill libraries better than frozen agents?. In other words, deciding which task or skill is 'suitable next' is itself a learnable job worth dedicating a separate, RL-trained system to. FlowReasoner pushes a sibling idea—a meta-agent that generates a bespoke workflow per query using execution feedback, choosing structure to fit the demand in front of it Can AI systems design unique multi-agent workflows per individual query?.

The through-line across these notes: a 'suitable' task is defined relative to a representation of current capability—a skill library, an episodic memory, a versioned capability vector Can semantic capability vectors replace manual agent routing?, or reusable sub-task routines mined from past runs Can agents learn reusable sub-task routines from past experience?. The agents that generate good next tasks are the ones that have externalized what they currently know into something a curriculum can read and extend Where does agent reliability actually come from?. The frontier question the corpus leaves open is whether that curating judgment is best left to the agent itself or handed to a separate system trained specifically to pick what comes next.


Sources 10 notes

Can agents learn beyond what their training data shows?

Agents trained on static expert datasets cannot learn from their own failures or generalize beyond demonstrated scenarios because they never interact with environments during training. Competence is capped by what curators imagined, not by agent capacity.

Can agents learn new skills without forgetting old ones?

VOYAGER demonstrates that storing executable skills in an embedding-indexed library and composing complex skills from simpler ones allows agents to learn continuously while avoiding the forgetting that occurs with weight-update-based methods. Environmental feedback refines skills while an automatic curriculum drives continual exploration.

Can agent deployment itself generate training signals automatically?

Every agent action produces a next-state signal (user reply, tool output, error, GUI change) that can train the policy directly. This universal signal source eliminates the need for separate training datasets across conversations, terminal tasks, SWE, and tool use.

Does creating skills inside the agent loop eliminate mismatches?

MUSE-Autoskill demonstrates that invoking skill creation from within the agent's reasoning loop grounds new skills in exact task context, immediate feedback, and runtime validation. In-loop skills reach 87.94% task accuracy and transfer to other agents with minimal loss, eliminating the situated context problem of offline authoring.

Can agents learn from failure without updating their weights?

Reflexion demonstrates that unambiguous environmental feedback (success/failure) enables agents to write useful self-diagnoses and improve across episodes without parameter updates. The binary signal prevents rationalization, and keeping reflections uncompressed preserves their usability.

Can a separate trained curator improve skill libraries better than frozen agents?

SkillOS shows that separating a trainable curator from a frozen executor, grouped by task streams, causes skill repositories to shift from generic verbose additions toward actionable execution logic and cross-task meta-strategies. The trained curator generalizes across different executor backbones and domains.

Can AI systems design unique multi-agent workflows per individual query?

FlowReasoner demonstrates that meta-agents trained with reinforcement learning and external execution feedback can generate unique multi-agent architectures for each user query, optimizing across performance, complexity, and efficiency—moving beyond fixed task-level workflow templates.

Can semantic capability vectors replace manual agent routing?

Versioned Capability Vectors embedded in HNSW indices couple semantic matching with policy and budget constraints, making capability discovery a first-class operation that scales sub-linearly as agent heterogeneity increases.

Can agents learn reusable sub-task routines from past experience?

Agent Workflow Memory induces sub-task routines at finer granularity than full tasks, abstracts example-specific values, and compounds them hierarchically. This produces 24.6% relative gain on Mind2Web and 51.1% on WebArena, with larger gains as train-test gaps widen.

Where does agent reliability actually come from?

Research shows reliable LLM agents externalize three cognitive burdens—memory (state persistence), skills (procedural components), and protocols (structured interaction)—into a harness layer rather than relying on model scale alone. The harness unifies these externalities and eliminates the need for the model to solve the same problems repeatedly.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst. The question remains open: How do agents automatically generate suitable learning tasks based on current capability?

What a curated library found—and when (dated claims, not current truth): Findings span 2023–2026, tracking curriculum learning and skill externalization in agentic systems.

• VOYAGER paired externalized skill libraries with automatic curricula; task suitability emerged from skill-library contents + environmental feedback, not abstract difficulty metrics (~2024).
• Agents that generate good next tasks externalize current knowledge into readable representations—skill libraries, episodic memory, versioned capability vectors—that a curriculum can extend (~2025–2026).
• Self-curation by the agent itself tends to accumulate verbose, generic skills; RL-trained separate curators (SkillOS) learn to evolve repositories toward genuinely strategic meta-skills instead (~2026).
• Agent interactions produce live next-state signals (tool outputs, errors, screen changes) that function as universally calibrated learning opportunities matched to exactly what the agent is doing (~2025).
• Coupling skill creation to runtime gaps—rather than offline prediction—grounds new tasks in exact context; Reflexion exemplified this with post-action verbal reflection stored as episodic memory (~2024–2025).

Anchor papers (verify; mind their dates):
• arXiv:2501.06252 Transformer2: Self-adaptive LLMs (2025)
• arXiv:2605.06614 SkillOS: Learning Skill Curation for Self-Evolving Agents (2026)
• arXiv:2504.15257 FlowReasoner: Reinforcing Query-Level Meta-Agents (2025)
• arXiv:2604.08224 Externalization in LLM Agents (2026)

Your task:
(1) RE-TEST EACH CONSTRAINT. For every claim above—skill-library curricula, self-curation vs. separate curators, live-signal learning, runtime-coupled skill creation—assess whether newer models, training regimes, multi-agent orchestration, or evaluation harnesses have since relaxed, overturned, or deepened these findings. Separate the durable question (which representations of capability best enable task generation?) from perishable limitations (does self-curation still underperform curator-RL?). Cite what resolved each.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months—especially any showing that fixed curricula, centralized curation, or agent-agnostic task generation outperforms capability-matched approaches, or that the overhead of externalization is unnecessary.
(3) Propose 2 research questions that ASSUME the regime may have moved: e.g., can multimodal capability vectors (vision + reasoning + tool use) be read live without memorization overhead? Does federated curation across agent collectives outperform per-agent learning?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines