Can transformers improve exponentially by learning from their own correct solutions?
Can standard transformers achieve extreme length generalization by iteratively filtering and training on their own correct outputs? This explores whether self-correction loops enable unbounded out-of-distribution improvement without architectural changes.
"Self-Improving Transformers Overcome Easy-to-Hard and Length Generalization Challenges" (2502.01612) demonstrates that standard transformer architectures can achieve extreme out-of-distribution generalization through a self-improvement loop: generate solutions, filter for correctness, train on the correct ones, repeat.
The results across arithmetic, string manipulation, and maze solving show generalization far beyond the training distribution — 10-digit to 100-digit addition without apparent saturation. The critical mechanism: filtering for correct self-generated examples produces exponential improvement in OOD performance across training rounds. Not linear. Exponential.
This is achieved without any modification to the base transformer architecture. No external verifiers beyond a correctness check. No curriculum design. No reward models. The model's own ability to occasionally solve harder problems (via sampling variance) provides the training signal for the next round. The correctness filter is the critical factor that distinguishes this from How quickly do errors compound during model self-training? — without verification, small errors compound exponentially in the wrong direction; with verification, correct solutions compound exponentially in the right direction.
The finding directly challenges What limits how much models can improve themselves?. The generation-verification gap says self-improvement is bounded because the model cannot verify better than it generates. But for tasks with automated verification (arithmetic, string manipulation), the verification is perfect — the gap vanishes. This is exactly the class of tasks where self-improvement works unboundedly.
Since Can language models improve themselves without any external training data?, the self-improving transformer uses a different but related mechanism: the model serves as both proposer (generating candidate solutions at harder scales) and solver (learning from its own correct solutions). The asymmetry comes from the fact that generating one correct solution to a harder problem is easier than reliably solving all harder problems.
The exponential improvement finding may explain why Can a single training example unlock mathematical reasoning?. If a single correct example at the boundary can seed an exponential self-improvement cascade, then the minimal signal needed for activation is genuinely minimal.
Inquiring lines that use this note as a source 24
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Do transformers learn generalizable algorithms or instance-based patterns?
- Can universal function approximators be expensive to learn in practice?
- How does distribution mismatch between training and deployment break self-correction?
- What makes self-modifying architectures learn their own update rules?
- Why do error avalanches accelerate in self-training loops without verification?
- How does error propagation limit transformer performance on complex tasks?
- Why do energy-based models generalize better on out-of-distribution data than standard transformers?
- Can self-consistency checks fully prevent error avalanching in self-training loops?
- How does self-distillation differ from standard fine-tuning approaches?
- Why do standard transformers fail on problems requiring serial algorithmic reasoning?
- Why do standard transformers fail to encode recursive structure in their hidden states?
- Why does optimizing only quality cause model collapse in self-improvement loops?
- Can transformers reason beyond fixed architectural depth limits?
- Why does filtering for correct examples prevent error compounding in self-training?
- How do transformers generate harder solutions when mostly trained on easier problems?
- Can bounded-depth transformers solve inherently sequential problems?
- How does error avalanching compound failures in self-training iterations?
- Can a model evaluate its own improvements without degrading over iterations?
- How do transformers stitch together learned behaviors when adapting to new tasks?
- Do transformer architectures structurally bias models toward short-term optimization?
- What four domain properties make self-healing failure loops actually work?
- Why does looping computation outperform adding more transformer layers?
- Can recurrent transformers learn genuinely new computations beyond inference stages?
- Can looping enable reasoning capabilities that fixed-depth transformers fundamentally cannot achieve?
Related concepts in this collection 5
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
What limits how much models can improve themselves?
Explores whether self-improvement has fundamental boundaries set by how well models can verify versus generate solutions, and what this means across different task types.
self-improving transformers exploit the vanishing gap for verifiable tasks
-
Can language models improve themselves without any external training data?
Explores whether two language models playing against each other—one generating questions, one solving them—can create a self-improving loop. Matters because it would eliminate dependence on human-labeled datasets.
related self-improvement mechanism
-
Can a single training example unlock mathematical reasoning?
Explores whether one example is enough to dramatically improve math problem-solving in language models, and whether learning continues after perfect memorization.
exponential cascade may explain minimal activation thresholds
-
How quickly do errors compound during model self-training?
When LLMs train on their own outputs without verification, do small mistakes amplify exponentially? This matters because it determines whether unsupervised self-improvement is even feasible.
direct tension: error avalanching predicts self-training collapses rapidly, while self-improving transformers achieve exponential improvement; the resolution is verification quality — self-improving transformers filter for correctness using automated verification (arithmetic, string matching), which prevents error accumulation; error avalanching occurs when self-training uses unverified outputs where small errors compound; the boundary between self-improvement and error avalanching is the verification gap
-
Can AI systems improve themselves through trial and error?
Explores whether replacing formal proof requirements with empirical benchmark testing enables AI systems to successfully modify and improve their own code iteratively, and what mechanisms prevent compounding failures.
extends self-improvement from task-specific domains (arithmetic, string manipulation) to general code-writing capability; DGM's evolutionary archive enables open-ended exploration while self-improving transformers follow a single improvement trajectory — population diversity vs. correctness filtering as alternative mechanisms for sustaining improvement
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Self-Improving Transformers Overcome Easy-to-Hard and Length Generalization Challenges
- Extrapolation by Association: Length Generalization Transfer in Transformers
- Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?
- Loop, Think, & Generalize: Implicit Reasoning in Recurrent-Depth Transformers
- A Mechanistic Analysis of Looped Reasoning Language Models
- Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
- RL + Transformer = A General-Purpose Problem Solver
- Progress Measures For Grokking Via Mechanistic Interpretability
Original note title
self-improving transformers achieve extreme length generalization through iterative self-generated solutions with exponential out-of-distribution improvement