INQUIRING LINE

Inquiring lines›What do model internals reveal abo…›How should agents manage informati…›Can AI-generated outputs constitut…›this inquiring line

An AI can get something right that none of its training examples got right — because diverse mistakes cancel while correct signals reinforce.

How does correctness emergence occur when no expert initially solved the task?

This explores how a model can arrive at correct answers that none of its individual training sources (the 'experts') could produce alone — where the right answer emerges from the collective rather than being copied from any one teacher.

This explores how correctness can emerge from a pool of imperfect teachers — when no single expert in the training data actually solved the task. The corpus has a sharp answer at the center of this question: models trained on many diverse experts don't imitate the best one, they implicitly *vote*. In Can models trained on many imperfect experts outperform each one?, cross-entropy optimization pushes a model toward the consensus across experts whose individual errors are uncorrelated. Because those errors cancel while the shared signal reinforces, low-temperature sampling surfaces a denoised majority vote that can beat every individual expert on the decisions that matter most. Correctness here is an emergent property of aggregation, not a property any teacher possessed.

The deeper mechanism is that the capability was latent and just needed to be triggered rather than invented. Does RL post-training create reasoning or just deploy it? argues base models already contain reasoning strategies in latent form, and post-training optimizes *when* to deploy them, not *how* to do them — activation vectors for reasoning strategies exist before any RL touches the model. Read alongside the voting result, this reframes 'emergence' as recombination: the pieces of a correct solution are distributed across the training signal, and training assembles them into a path no single source walked end to end.

This also explains why the 'experts' don't need to be right, or even coherent. Do reasoning traces need to be semantically correct? shows models trained on systematically irrelevant traces keep their accuracy and sometimes generalize better — the trace works as computational scaffolding, not as a transcript of correct thought. And Can reasoning emerge from expert demonstrations alone? recovers an implicit reward function from demonstrations through adversarial policy-critic training, reaching verifier-level performance in domains that have no automated checker at all. Both cases sever correctness-of-output from correctness-of-any-input.

But the corpus also marks the boundary where this magic stops. Emergence-from-aggregation needs *independent* signal to denoise against; it isn't free improvement from nothing. Can models reliably improve themselves without external feedback? shows pure self-improvement stalls on the generation-verification gap and diversity collapse — every reliable method smuggles in an external anchor (past versions, third-party judges, user corrections, tool feedback). The diverse-expert pool *is* that external anchor; remove the diversity and the voting collapses. Relatedly, Is reflection in reasoning models actually fixing mistakes? finds that the apparent self-correction in reasoning chains is mostly post-hoc confirmation — the gain comes from better first answers, not from a model talking itself from wrong to right.

The thing you may not have known you wanted to know: 'no expert solved it' is not the obstacle it sounds like, because the model was never really learning from any single expert — it was learning from the *shape of their disagreement*. Correctness emerges in the gaps between teachers. Which is also why, when the teachers stop disagreeing in useful ways (diversity collapse) or when there's no independent signal to vote against (pure self-improvement), the emergence quietly disappears.

Sources 6 notes

Can models trained on many imperfect experts outperform each one?

Generative models trained on many diverse experts with different biases converge toward consensus behavior through cross-entropy optimization. Low-temperature sampling reveals this implicit majority vote, which outperforms any single expert by denoising uncorrelated individual errors on critical decision states.

Does RL post-training create reasoning or just deploy it?

Evidence shows base models already contain reasoning capability in latent form; RL training optimizes deployment timing rather than capability creation. Hybrid models recover 91% of performance gains by routing tokens only, and activation vectors for reasoning strategies pre-exist before any RL.

Do reasoning traces need to be semantically correct?

Models trained on systematically irrelevant traces maintain solution accuracy and sometimes improve out-of-distribution generalization, suggesting traces function as computational scaffolding rather than meaningful reasoning steps.

Can reasoning emerge from expert demonstrations alone?

RARO recovers implicit reward functions from expert demonstrations through adversarial co-training between a reasoning policy and relativistic critic. This approach matches verifier-based RL performance on reasoning tasks while extending to domains lacking automated verification.

Can models reliably improve themselves without external feedback?

Pure self-improvement stalls due to the generation-verification gap, diversity collapse, and reward hacking. Reliable improvement methods succeed by smuggling in external anchors: past model versions, third-party judges, user corrections, or tool feedback.

Show all 6 sources

Is reflection in reasoning models actually fixing mistakes?

Analysis of 8 reasoning models shows reflections rarely change answers and primarily serve as post-hoc confirmation. Training on longer reflection chains improves first-answer quality, not self-correction capability.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!1.72 match · arxiv ↗
The Invisible Leash: Why RLVR May Not Escape Its Origin1.71 match · arxiv ↗
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?1.69 match · arxiv ↗
Local Coherence or Global Validity? Investigating RLVR Traces in Math Domains1.69 match · arxiv ↗
Beyond Semantics: The Unreasonable Effectiveness of Reasonless Intermediate Tokens1.69 match · arxiv ↗
Escaping the Verifier: Learning to Reason via Demonstrations0.89 match · arxiv ↗
Eliciting Reasoning in Language Models with Cognitive Tools0.88 match · arxiv ↗
On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models0.88 match · arxiv ↗

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing the constraint that correctness can emerge from imperfect teachers through implicit consensus mechanisms. The question remains open: under what conditions does aggregation of diverse, uncorrelated errors actually produce capability not present in any source?

What a curated library found — and when (dated claims, not current truth):
Findings span 2024–2026; treat these as perishable thresholds to re-examine:
• Cross-entropy optimization on diverse experts implicitly enacts majority voting; low-temperature sampling recovers denoised consensus beating every individual teacher (2024-06, arXiv:2406.11741).
• Base models contain latent reasoning strategies before post-training; RL optimizes *when* to deploy them, not *how* to construct them — reframing emergence as recombination, not invention (inferred from path context).
• Models trained on systematically irrelevant or corrupted reasoning traces maintain accuracy and sometimes generalize better; correctness of output decouples from correctness of input scaffold (2025-05, arXiv:2505.13775).
• Pure self-improvement stalls on generation-verification gap and diversity collapse; every reliable method smuggles external anchors—past checkpoints, third-party judges, tool feedback (2024-12, arXiv:2412.02674).
• Apparent self-correction in reasoning chains is mostly post-hoc confirmation; gains come from better first answers, not from talking-through (2025-05, arXiv:2505.20296).

Anchor papers (verify; mind their dates):
• arXiv:2406.11741 (2024-06) — Transcendence: Generative Models Can Outperform The Experts That Train Them
• arXiv:2412.02674 (2024-12) — Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models
• arXiv:2505.13775 (2025-05) — Beyond Semantics: The Unreasonable Effectiveness of Reasonless Intermediate Tokens
• arXiv:2511.21667 (2025-11) — Escaping the Verifier: Learning to Reason via Demonstrations

Your task:
(1) RE-TEST THE AGGREGATION CONSTRAINT. For each finding above, determine whether newer scaling, architectural changes (sparse routing, mixture-of-agents), post-training methods (outcome-based or process-based reward learning), or online RL have since relaxed the requirement for *diverse* uncorrelated signal. Can models now extract consensus from correlated expert pools, or does diversity remain essential? Does the voting mechanism hold at scale, or does it degrade? Separate the durable insight (aggregation can produce emergence) from the perishable limit (you need independent signal).
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months — especially work on pure self-play, synthetic data, or reward hacking that either resurrects self-improvement without external anchors or shows the voting mechanism breaks under new conditions.
(3) Propose 2 research questions that assume the regime may have shifted: (a) Can a model ensemble trained on *deliberately correlated* error patterns achieve the same denoising as diverse voting? (b) If reasoning is latent and post-training merely activates it, what happens when the base model contains no latent path to the task — does aggregation fail entirely, or does it force invention?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

An AI can get something right that none of its training examples got right — because diverse mistakes cancel while correct signals reinforce.

Related lines of inquiry

Sources 6 notes

Papers this line draws on 8