SYNTHESIS NOTE
Training, RL, and Test-Time Scaling Reasoning, Retrieval, and Evaluation Model Architecture and Internals

Does self-generated training data improve model learning?

Can models learn more effectively from training data they generate themselves rather than data created by external sources? This explores whether a learner's own restructuring process produces better learning outcomes.

Synthesis note · 2026-02-22 · sourced from Self Refinement Self Consistency Feedback
How should we allocate compute budget at inference time? How do you build domain expertise into general AI models?

SEAL (Self-Adapting Language Models) equips LLMs with the ability to generate "self-edits" — natural-language instructions that specify both the training data and optimization hyperparameters for updating the model's own weights. Given new factual knowledge to incorporate, instead of finetuning directly on the source text, the model generates its own synthetic training data optimized for self-learning.

The results are counter-intuitive: finetuning on self-generated data improves no-passage QA performance from 33.5% to 47.0%, outperforming data generated by GPT-4.1. A weaker model's self-generated data produces better learning outcomes than a stronger model's externally generated data.

The analogy to human learning is precise: students who rewrite lecture material in their own words consistently outperform students who study the original text. The restructuring process is itself the learning — it forces the learner to identify gaps, reframe concepts in familiar terms, and create connections to existing knowledge. Different learners restructure differently (visual diagrams, text summaries, mathematical formulations) because the optimal transformation depends on the learner's representational structure, not just the content.

For LLMs, this means the model's own distributional characteristics determine what data format will produce effective weight updates. A model with particular learned representations will learn more from data that aligns with those representations than from data optimized for a different model's internal structure.

The method uses RL to train the self-edit capability: the downstream performance of the updated model serves as the reward signal. This means the model learns not just what to study but how to study — selecting augmentation strategies and optimization hyperparameters alongside content.

On a simplified ARC-AGI subset, SEAL also outperforms both standard in-context learning and self-editing without RL training, showing that the quality of self-generated data improves with the RL-trained meta-learning capability.

Two converging methods from alignment research reinforce the self-generation principle. First, instruction backtranslation (Humpback) trains an LLM to generate instructions for unlabeled web text, then self-selects high-quality pairs through iterative curation — the model generates its own training signal and curates it. Second, Can aligned LLMs generate their own training data? (MAGPIE) shows that aligned models can generate 4 million instruction-response pairs from their pre-query template alone, outperforming human-curated datasets. Both methods demonstrate the same principle at different levels: self-generated data captures the model's own distributional preferences, producing more learnable training signal than external generation.

This extends Does training data format shape reasoning strategy more than domain? to training data generation: not just the format of training data matters, but who generates it. And it provides a constructive mechanism for Does teacher-refined data always improve student model performance? — the ideal "teacher" for data refinement is the student model itself.

Inquiring lines that use this note as a source 9

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 5

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
17 direct connections · 197 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

self-generated training data outperforms externally generated data for knowledge incorporation because model-specific restructuring matches the learner's representational needs