INQUIRING LINE

Inquiring lines›How should we train models for cap…›What systematic failures and vulne…›What are the consequences of model…›this inquiring line

If an AI gets better by training on its own outputs, does it quietly become less versatile at everything else?

Does self-generated training data reduce a model's capability diversity?

This explores whether feeding a model its own outputs as training data narrows the range of things it can do — collapsing the spread of formats, solutions, and styles it once had — even when that self-generated data improves accuracy.

This explores the tension between two things the corpus treats as separate but that turn out to be linked: self-generated training data can make a model *better* while quietly making it *narrower*. The cleanest evidence for the upside is SEAL, where models learned knowledge more effectively from data they generated themselves than from data produced by a stronger external teacher — QA accuracy jumped from 33.5% to 47.0% Does self-generated training data improve model learning?. The intuition is that a model restructures information into a shape that fits its own representations, which a teacher can't do for it Does teacher-refined data always improve student model performance?. So self-generated data isn't a degraded substitute — it's sometimes the better fuel.

But 'better at the target' and 'diverse' aren't the same axis, and that's where the corpus gets interesting. Training a model on a distribution it already favors tends to amplify that favorite and suppress the alternatives. RL post-training, for example, locks onto a single dominant format inherited from pretraining within the first epoch and collapses the others — and the winning format is chosen by model scale, not by which one performs best Does RL training collapse format diversity in pretrained models?. Post-training also closes a feedback loop where the model starts treating its own outputs as its next inputs, which shows up as 3–4x lower output entropy Do models recognize their own outputs as actions shaping future inputs?. Lower entropy is exactly what 'reduced capability diversity' looks like from the outside: the model keeps reaching for the same moves.

The sharpest version of the worry is the 'Artificial Hivemind.' Across 70+ models and 26K open-ended prompts, different models independently converged on strikingly similar — sometimes identical — responses, because they share overlapping training data and alignment recipes Do different AI models actually produce diverse outputs?. If self-generated data becomes a bigger share of what models train on, this is the failure mode that compounds: a model recycling its own most-probable outputs has no outside source of variety to pull it back toward the tails.

The corpus doesn't treat this as inevitable, though — and that's the part worth knowing. Whether self-training narrows you depends on what the training rewards and whether something interrupts the convergence. Preference tuning *reduces* lexical diversity in code (where there's one right answer) but *increases* it in creative writing (where distinctiveness is rewarded), so the direction flips by domain Does preference tuning always reduce diversity the same way?. Adding a critique step inside the self-training loop actively counteracts 'tail narrowing' and preserves solution diversity across iterations — the authors argue this is a more fundamental win than the test-time accuracy bump Do critique models improve diversity during training itself?. Training order matters too: doing structured tasks before open-ended ones prevents entropy collapse from spilling over and damaging creative capability Does training order reshape how models handle different task types?.

The deeper reason all of this stays bounded: self-generated data, by definition, can't add capability the model didn't already have. Post-training *selects* from latent ability rather than creating it Do base models already contain hidden reasoning ability?, and self-improvement is formally capped by the generation–verification gap — every reliable gain needs something external to validate it What stops large language models from improving themselves?. So the honest answer is: self-generated data tends to reduce diversity by default, because it concentrates probability mass on what the model already prefers — but that's a tendency you can fight with critique loops, domain-aware rewards, and scheduling, not a law you're stuck with.

Sources 10 notes

Does self-generated training data improve model learning?

SEAL demonstrates that models learn better from synthetic data they generate themselves than from data created by stronger external models. Self-generated data improved QA performance from 33.5% to 47.0%, suggesting that model-specific restructuring aligns with the learner's representational needs.

Does teacher-refined data always improve student model performance?

Teacher-refined data degrades performance when it exceeds the student's learning frontier, even if objectively higher quality. Students should filter refinements using their own statistical profile to retain only compatible improvements.

Does RL training collapse format diversity in pretrained models?

Controlled experiments show RL consistently amplifies one format distribution from pretraining within the first epoch while collapsing alternatives. The winning format depends on model scale, not necessarily performance, and is largely hidden when starting from proprietary pretrained models.

Do models recognize their own outputs as actions shaping future inputs?

Post-trained language models exhibit a measurable shift where they recognize their outputs become their own future inputs, closing an action-perception loop absent in pretraining. Evidence includes 3-4x lower output entropy on-policy and behavioral signatures of trajectory recognition.

Do different AI models actually produce diverse outputs?

INFINITY-CHAT analyzed 70+ models across 26K open-ended queries and found an "Artificial Hivemind" effect: models independently generate strikingly similar or identical responses due to overlapping training data and alignment procedures, undermining the diversity benefits of model ensembles.

Show all 10 sources

Does preference tuning always reduce diversity the same way?

RLHF reduces lexical-syntactic diversity in code generation but increases it in creative writing. The direction depends on what each domain incentivizes: code rewards convergence toward correct solutions, while creative writing rewards stylistic distinctiveness.

Do critique models improve diversity during training itself?

Step-level critique in the training loop counteracts tail narrowing and maintains solution diversity across self-training iterations. This training-time benefit—preventing premature convergence—is more fundamental than test-time accuracy gains.

Does training order reshape how models handle different task types?

Omni-Thinker shows structured domains decrease output entropy while creative domains increase it. BWT-guided scheduling—training structured tasks first—yields 6.2% gains over joint training by preventing entropy collapse from damaging open-ended capabilities.

Do base models already contain hidden reasoning ability?

Five independent mechanisms—RL steering, critique fine-tuning, decoding changes, SAE feature steering, and RLVR—all elicit reasoning already present in base model activations. Post-training selects rather than creates reasoning; the bottleneck is elicitation, not capability acquisition.

What stops large language models from improving themselves?

Self-improvement in LLMs is formally bounded by the generation-verification gap, meaning every reliable fix requires something external to validate and enforce it. Models cannot escape this constraint through metacognition alone.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining3.31 match · arxiv ↗
On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models2.51 match · arxiv ↗
Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models2.49 match · arxiv ↗
Evaluating the Diversity and Quality of LLM Generated Content1.71 match · arxiv ↗
Eliciting Reasoning in Language Models with Cognitive Tools1.70 match · arxiv ↗
Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning1.69 match · arxiv ↗
NoveltyBench: Evaluating Language Models for Humanlike Diversity1.67 match · arxiv ↗
Post-training makes large language models less human-like1.63 match · arxiv ↗

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are an LLM researcher evaluating whether self-generated training data reduces capability diversity—a question a curated library (2024–2026) treats as OPEN and CONTESTED, not settled. Treat all findings below as dated claims to be re-tested against the latest models and methods.

What a curated library found — and when (dated claims, not current truth):
• SEAL (2024): Self-generated QA data outperformed external teacher data (33.5% → 47.0% accuracy), because models reshape information into their own representations (2024–2025).
• Post-training locks onto a single dominant format within epoch one and collapses alternatives; output entropy drops 3–4x as models treat their own outputs as next inputs (2025).
• Across 70+ models and 26K prompts, independent convergence on identical responses—the 'Artificial Hivemind' risk—because shared training data + alignment recipes create overlap (2025).
• Preference tuning REDUCES lexical diversity in code (one right answer) but INCREASES it in creative writing (distinctiveness rewarded); direction is domain-dependent (2025).
• Critique loops during self-training actively preserve solution diversity and counter tail-narrowing; structured-then-open-ended task ordering prevents entropy collapse spillover (2025–2026).

Anchor papers (verify; mind their dates):
• arXiv:2504.07912 (Echo Chamber: RL Post-training, 2025)
• arXiv:2510.22954 (Artificial Hivemind, 2025)
• arXiv:2411.16579 (Critique Models, 2024)
• arXiv:2605.25459 (Enaction, 2026)

Your task:
(1) RE-TEST EACH CONSTRAINT. For the entropy collapse, format-locking, and convergence findings: have newer models (o3, GPT-4o, Llama 3.3) or post-training methods (DPO, IPO, newer RL harnesses) since RELAXED these limits? Separate the durable question (does self-generation favor concentration?) from the perishable limitation (does it necessarily cause it?). Cite what relaxed it.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months—especially anything showing self-generated data *preserves* diversity without external critique.
(3) Propose 2 research questions that ASSUME the regime may have shifted: (a) Does ensemble decoding or mixture-of-experts routing intrinsically resist format-locking? (b) Can contrastive self-generation (model generates *and* anti-generates*) break the convergence trap?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

If an AI gets better by training on its own outputs, does it quietly become less versatile at everything else?

Related lines of inquiry

Sources 10 notes

Papers this line draws on 8