INQUIRING LINE

Inquiring lines›How should agents manage and coord…›How can training approaches develo…›How do training data properties sh…›this inquiring line

One training example can double a model's math score — not because it learned anything new, but because the skill was already inside it.

How do single training examples activate reasoning capabilities in language models?

This explores the surprising finding that a single training example can switch on math reasoning in a language model — and why a tiny signal produces such an outsized effect.

This explores how one or a handful of training examples can dramatically improve reasoning, and the corpus has a clear throughline: the training isn't *teaching* reasoning, it's *unlocking* reasoning the model already had. The headline result is striking — in RLVR (reinforcement learning with verifiable rewards), a single math example pushes performance from 36% to over 73%, and accuracy keeps climbing for 1,400 steps even after the model has perfectly memorized that one example Can a single training example unlock mathematical reasoning?. That pattern only makes sense if the example is a trigger, not a lesson. The broader claim is that base models already carry latent reasoning in their activations, and five completely different methods — RL steering, critique fine-tuning, decoding tweaks, feature steering, and RLVR — all converge on the same conclusion: post-training *selects* reasoning rather than creating it Do base models already contain hidden reasoning ability?.

Why can one example do so much? Part of the answer is that the learning signal is concentrated in very few places. Only about 20% of tokens are high-entropy 'forking points' where the model genuinely decides where the reasoning goes — and training on just those tokens matches full training Do high-entropy tokens drive reasoning model improvements?. So a single rich example can supply enough signal at the decision points that actually matter. This reframes 'activation' as nudging the model toward reasoning trajectories it could already produce but wasn't selecting by default. The same logic shows up elsewhere: prompt optimization can retrieve and reorganize existing knowledge but cannot inject anything new — there's a hard ceiling at the edge of training Can prompt optimization teach models knowledge they lack?. Activation and elicitation are the operative verbs across this whole territory.

The uncomfortable flip side is what 'activation' does *not* mean. If one example unlocked genuine general reasoning, you'd expect it to transfer broadly — but several notes suggest what gets unlocked is closer to pattern reproduction than abstract inference. Chain-of-thought degrades predictably under distribution shift, the signature of imitating reasoning *form* rather than performing it Does chain-of-thought reasoning reveal genuine inference or pattern matching?. Models lean on semantic associations, not symbolic logic — strip the familiar semantics and performance collapses even with correct rules in hand Do large language models reason symbolically or semantically?. And failures track *instance novelty*, not task complexity: a reasoning chain succeeds if the model saw similar instances, regardless of length Do language models fail at reasoning due to complexity or novelty?. So the single example activates a capability bounded by the training distribution — powerful within it, brittle outside it.

There's a deeper doorway here worth opening. If you don't even need labeled examples, reasoning can emerge as a *side effect* of ordinary language modeling: Quiet-STaR trains the model to generate a rationale at every token on arbitrary internet text, judged only by whether it improves prediction Can models learn reasoning from predicting any text?. That's the most extreme version of the corpus's thesis — reasoning isn't a skill you install, it's a latent competence that the right training signal surfaces. And it raises a sharp caution flag: if traces are stylistic mimicry where invalid logical steps perform nearly as well as valid ones Do reasoning traces show how models actually think?, then 'activated reasoning' may sometimes be activated *appearance* of reasoning. The single-example result is real and remarkable — but what it reveals is less that we can teach reasoning cheaply, and more that the reasoning was sitting there all along, waiting for a key.

Sources 9 notes

Can a single training example unlock mathematical reasoning?

A single example in RLVR boosts math performance from 36% to 73.6% and enables test accuracy to improve for 1,400 steps after training accuracy reaches 100%, revealing that minimal activation signals unlock latent reasoning capability.

Do base models already contain hidden reasoning ability?

Five independent mechanisms—RL steering, critique fine-tuning, decoding changes, SAE feature steering, and RLVR—all elicit reasoning already present in base model activations. Post-training selects rather than creates reasoning; the bottleneck is elicitation, not capability acquisition.

Do high-entropy tokens drive reasoning model improvements?

Only ~20% of tokens exhibit high entropy as pivotal reasoning decision points; RLVR primarily adjusts these forking tokens. Training exclusively on them matches or exceeds full-gradient performance, revealing that the minority carries the learning signal.

Can prompt optimization teach models knowledge they lack?

Prompting works entirely within a model's pre-existing training distribution and cannot supply domain knowledge absent from training data. This creates a hard ceiling: no prompt strategy can compensate for missing foundational knowledge, only reorganize what already exists.

Does chain-of-thought reasoning reveal genuine inference or pattern matching?

CoT works by constraining models to reproduce familiar reasoning patterns from training, not by enabling novel symbolic reasoning. Performance degrades predictably under distribution shifts—the signature of imitation rather than capability emergence.

Show all 9 sources

Do large language models reason symbolically or semantically?

When semantic content is decoupled from reasoning tasks, LLM performance collapses even with correct rules in context. Models rely on parametric commonsense and token associations rather than formal logical manipulation, constraining reasoning to training distribution semantics.

Do language models fail at reasoning due to complexity or novelty?

LRMs don't break at complexity thresholds but at instance-novelty boundaries. Models fit instance-based patterns rather than generalizable algorithms, so any reasoning chain succeeds if trained on similar instances, regardless of length.

Can models learn reasoning from predicting any text?

Quiet-STaR trains language models to generate rationales at every token position during pretraining on arbitrary internet text, enabling general reasoning without task-specific datasets. Rationale quality is judged by predictive accuracy rather than labeled correctness, allowing reasoning competence to emerge as a side effect of improved language modeling.

Do reasoning traces show how models actually think?

LLM reasoning traces perform as persuasive appearances rather than reliable explanations of computation. Invalid logical steps perform nearly as well as valid ones, and corrupted traces generalize comparably, showing that semantic correctness is not what produces the performance gains.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a reasoning-capability researcher re-testing claims about single-example activation in LLMs. The question remains open: *what mechanism allows one training example to unlock reasoning, and how brittle is that unlock outside its native distribution?*

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–2026; treat as perishable constraints to re-test:
  • One math example in RLVR pushes accuracy from 36% to 73%+, and gains persist for 1,400 steps post-memorization — suggesting the example triggers latent reasoning, not teaches it (~2025, arXiv:2504.20571).
  • Only ~20% of tokens ('high-entropy forking points') carry learning signal; training on just those tokens matches full training, explaining why one rich example suffices (~2026, arXiv:2601.06176).
  • Chain-of-thought degrades under distribution shift — signature of *imitating* reasoning form, not performing abstract inference (~2026, arXiv:2506.02878).
  • Models are semantic reasoners, not symbolic; strip familiar semantics, performance collapses even with correct rules present (~2023, arXiv:2305.14825).
  • Quiet-STaR shows reasoning emerges as a side effect of language modeling on arbitrary text, needing only prediction-improvement signal (~2024, arXiv:2403.09629).

Anchor papers (verify; mind their dates):
  • arXiv:2504.20571 (2025) — RLVR single-example math activation.
  • arXiv:2506.02878 (2026) — CoT as imitation, not true reasoning.
  • arXiv:2403.09629 (2024) — Quiet-STaR token-level rationale learning.
  • arXiv:2602.06176 (2026) — LLM reasoning failure modes.

Your task:
  (1) RE-TEST EACH CONSTRAINT. For RLVR's 36→73% jump: has newer RL orchestration (multi-agent rollout, memory-augmented reward, on-policy exploration) pushed that ceiling higher, or does the distribution-brittleness remain? For the 20% forking-point claim: do recent interpretability tools or causal steering validate this, or has scaling revealed different bottlenecks? For CoT-as-imitation: do recent evals (e.g., adversarial semantics, symbolic consistency) confirm it's stylistic mimicry, or have decoding innovations (e.g., search, self-correction) recovered abstract reasoning? Separate what's still unsolved (generalization boundary? true symbolic reasoning?) from what may be solved (activation mechanism itself?).
  (2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months. Does any recent paper show single-example transfer *across* distributions, or new evidence that CoT is NOT mimicry? Flag if scaling or architectural change (e.g., mixture-of-experts, longer context, retrieval integration) has shifted the regime.
  (3) Propose 2 frontier questions that ASSUME the regime may have moved: (a) If activation is now reliable *across* domains via meta-learning or in-context adaptation, what does that tell us about what was actually latent? (b) If we can now distinguish *symbolic* forking points from semantic ones, can we train one example to unlock true reasoning instead of imitation?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

One training example can double a model's math score — not because it learned anything new, but because the skill was already inside it.

Related lines of inquiry

Sources 9 notes

Papers this line draws on 8