INQUIRING LINE

Should we train the evolver or the executor when building self-improving agents?

This explores a design fork in self-improving agents: do you put the learning into the executor that does the task, or into the separate 'evolver' that rewrites the agent's skills, prompts, and harness — and the corpus increasingly points toward training the evolver while freezing the executor.


This reads the question as a place-the-learning problem: a self-improving agent has two layers — the executor that acts in the environment, and an evolver/curator that revises what the executor knows and how it's wired. The corpus's most direct answer comes from SkillOS Can a separate trained curator improve skill libraries better than frozen agents?, which keeps the executor frozen and trains only the curator. The payoff is that the curator learns to push skill repositories away from generic verbose additions toward actionable execution logic and cross-task meta-strategies — and, crucially, it generalizes across different executor backbones. That last detail is the real argument for training the evolver: the learned thing transfers, rather than being baked into one model's weights.

There's a sharp supporting clue in why the executor is the wrong place to invest. The finding that harness-improvement quality is flat across model tiers Do stronger models always evolve their own harnesses better? shows that generating useful updates isn't bottlenecked by raw model strength — even smaller models write comparable edits. The bottleneck is activating and following those updates, which peaks at mid-tier. So pouring capability into the executor buys you less than you'd think; the leverage is in the evolution loop, not the actor.

But 'train the evolver' has a precondition the corpus is blunt about: the evolver needs a real external signal, or it eats itself. The self-improvement mirage note Can models reliably improve themselves without external feedback? argues pure self-improvement stalls on the generation-verification gap, diversity collapse, and reward hacking — every method that actually works smuggles in an external anchor (past versions, third-party judges, user corrections, tool feedback). The successful evolvers in this collection all obey that rule: the Darwin Gödel Machine Can AI systems improve themselves through trial and error? swaps formal proofs for empirical benchmarking against an archive of variants; FlowReasoner Can AI systems design unique multi-agent workflows per individual query? trains a meta-agent on external execution feedback; SkillClaw How can agent systems share learned skills across users? runs its autonomous evolver over aggregated cross-user trajectories. The evolver is trainable precisely because the environment grades it.

The interesting wrinkle is that 'train the evolver' and 'don't train the executor' don't have to mean weight updates at all. A whole branch of the corpus evolves the executor's behavior through externalized memory while leaving its parameters untouched: VOYAGER's composable skill library Can agents learn new skills without forgetting old ones? avoids the catastrophic forgetting that weight updates cause, Reflexion stores verbal self-diagnoses as episodic memory Can agents learn from failure without updating their weights?, and ReasoningBank distills strategy hints from both wins and failures Can agents learn better from their failures than successes?. Here the 'evolver' is whatever curates that store — and the SkillOS result says: don't curate it by hand or with a frozen agent, train something to curate it.

Where the corpus gets honest about the ceiling is metacognition. Truly self-improving agents need intrinsic metacognition Can AI systems improve their own learning strategies? — today's evolvers run on fixed, human-designed loops that break under domain shift. So the deeper answer to 'evolver or executor' is: train the evolver, but the frontier is making the evolver able to revise its own learning strategy, not just the executor's skills. RLVMR's process rewards for planning and reflection Can RL agents learn to reason better, not just succeed? are an early step toward training that metacognitive layer directly rather than treating it as a fixed scaffold. The thing you didn't know you wanted to know: the question isn't really executor-vs-evolver — it's how high up the meta-ladder you can afford to put the trainable part.


Sources 11 notes

Can a separate trained curator improve skill libraries better than frozen agents?

SkillOS shows that separating a trainable curator from a frozen executor, grouped by task streams, causes skill repositories to shift from generic verbose additions toward actionable execution logic and cross-task meta-strategies. The trained curator generalizes across different executor backbones and domains.

Do stronger models always evolve their own harnesses better?

Model strength doesn't bottleneck writing useful harness edits—even smaller models generate comparable improvements. But using those updates non-monotonically peaks at mid-tier models, with weak and strong models both struggling to activate and follow updated instructions.

Can models reliably improve themselves without external feedback?

Pure self-improvement stalls due to the generation-verification gap, diversity collapse, and reward hacking. Reliable improvement methods succeed by smuggling in external anchors: past model versions, third-party judges, user corrections, or tool feedback.

Can AI systems improve themselves through trial and error?

DGM replaces formal proofs with empirical benchmarking and maintains an evolutionary archive of agent variants, achieving 2.5× improvement on SWE-bench and 2.2× on Polyglot by discovering capabilities like better code editing and context management.

Can AI systems design unique multi-agent workflows per individual query?

FlowReasoner demonstrates that meta-agents trained with reinforcement learning and external execution feedback can generate unique multi-agent architectures for each user query, optimizing across performance, complexity, and efficiency—moving beyond fixed task-level workflow templates.

How can agent systems share learned skills across users?

SkillClaw aggregates interaction trajectories across users, processes them through an autonomous evolver that identifies patterns and refines skills, then synchronizes updates system-wide. This converts siloed individual learning into shared capability improvement without manual curation.

Can agents learn new skills without forgetting old ones?

VOYAGER demonstrates that storing executable skills in an embedding-indexed library and composing complex skills from simpler ones allows agents to learn continuously while avoiding the forgetting that occurs with weight-update-based methods. Environmental feedback refines skills while an automatic curriculum drives continual exploration.

Can agents learn from failure without updating their weights?

Reflexion demonstrates that unambiguous environmental feedback (success/failure) enables agents to write useful self-diagnoses and improve across episodes without parameter updates. The binary signal prevents rationalization, and keeping reflections uncompressed preserves their usability.

Can agents learn better from their failures than successes?

ReasoningBank shows that storing strategy-level reasoning hints from both self-judged successes and failures outperforms success-only memory and raw trajectory storage. Coupled with test-time scaling, memory and compute compound rather than substitute, creating a novel scaling law where accuracy improves through cumulative interaction history.

Can AI systems improve their own learning strategies?

Current self-improvement methods use extrinsic, fixed metacognitive loops designed by humans that fail under domain shift or capability changes. True self-improvement requires agents to generate their own adaptive metacognitive knowledge, planning, and evaluation—a gap confirmed as a neglected research area across neuro-symbolic AI.

Can RL agents learn to reason better, not just succeed?

RLVMR uses structured meta-reasoning tags (planning, exploration, reflection, monitoring) with programmatic rewards to train agentic RL. This reduces repetitive actions by 31% compared to outcome-only methods while maintaining better generalization than supervised fine-tuning alone.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing claims about self-improving agent architecture. The question: **Should we train the evolver or the executor?** remains open; treat the findings below as dated constraints, not current truth.

**What a curated library found — and when (2023–2026):**
- Training only the evolver (curator) generalizes across executor backbones; frozen executor + trained skill curator outperforms end-to-end weight updates (SkillOS, 2026).
- Useful harness-update generation plateaus across model tiers (mid-tier); bottleneck is activation/execution, not generation capacity — so executor capability investment yields diminishing returns (2026).
- Every reliably self-improving agent uses external signal: benchmarking against archives, user feedback, tool grading, or cross-user trajectories. Pure self-improvement (no external anchor) hits generation-verification gap and reward hacking (2024–2025).
- Memory-based evolution (skill libraries, episodic reflection, reasoning banks) avoids catastrophic forgetting without weight updates; curating these stores is the real leverage point (2023–2026).
- Metacognition bottleneck: today's evolvers run on fixed human-designed loops; frontier is training the evolver to revise *its own* learning strategy, not just executor skills (2025–2026).

**Anchor papers (verify; mind their dates):**
- SkillOS (arXiv:2605.06614, 2026)
- Darwin Gödel Machine (arXiv:2505.22954, 2025)
- Truly Self-Improving Agents Require Intrinsic Metacognition (arXiv:2506.05109, 2025)
- RLVMR (arXiv:2507.22844, 2025)

**Your task:**
(1) **RE-TEST EACH CONSTRAINT.** For each finding above, determine whether recent models (o3, o4, Claude 4+), training methods (online RL, process reward scaling), tooling (orchestration frameworks, agentic SDKs), or evals have relaxed or overturned it. Isolate the durable question (likely still open: *where is learning most leveraged?*) from perishable limitations (e.g., *does frozen-executor generalization still hold with 1T-parameter models?*). Cite what resolved it or state where constraint persists.

(2) **Surface the strongest contradicting or superseding work** from the last ~6 months. Has anyone shown end-to-end executor training outperforms curator-only in a real task? Has metacognitive training succeeded without external signal?

(3) **Propose 2 research questions** that assume the regime may have shifted:
   - (A) Does scaling executor capacity + online RL make the evolver/executor distinction obsolete (converge to unified learning)?
   - (B) Can intrinsic metacognition be trained without external grading if the executor learns to *simulate* external judges?

**Guardrail:** Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines