INQUIRING LINE

Do base models contain latent reasoning that minimal training can unlock?

This explores whether reasoning ability is already sitting inside base models before any reasoning-specific training — so that methods like RL, fine-tuning, or clever prompting merely *reveal* it rather than build it from scratch.


This explores whether reasoning is something base models already possess and that minimal training simply unlocks — as opposed to a skill that post-training has to install. The strongest version of this claim comes from work showing that five completely different interventions — RL steering, critique fine-tuning, decoding tweaks, sparse-autoencoder feature steering, and RLVR — all surface the *same* reasoning that was already latent in base-model activations Do base models already contain hidden reasoning ability?. The punchline is that post-training *selects* rather than *creates*: the bottleneck isn't acquiring the capability, it's eliciting it. Several other corners of the corpus independently converge on this. Modular 'cognitive tools' — just sandboxed LLM calls, no RL at all — lifted GPT-4.1 on competition math from 27% to 43% purely by isolating reasoning operations the model could already perform Can modular cognitive tools unlock reasoning without training?. And reasoning verbosity turns out to be a single linear direction you can steer with a vector pulled from 50 examples, no retraining Can we steer reasoning toward brevity without retraining?. When behavior bends to a handful of examples or a steering vector, that's the signature of something already present being redirected.

Where it gets more interesting is *what kind* of reasoning is actually latent. A skeptical thread argues that what gets unlocked is pattern-completion, not formal inference. When semantic content is stripped from a task, LLM performance collapses even with correct rules sitting in context — models lean on learned associations, not symbolic logic Do large language models reason symbolically or semantically?. Chain-of-thought looks the same way under scrutiny: it reproduces familiar reasoning *schemata* from training and degrades predictably under distribution shift — the tell of imitation rather than emergent capability Does chain-of-thought reasoning reveal genuine inference or pattern matching?. Entailment judgments turn out to track whether the hypothesis was memorized, not whether the premise supports it Do LLMs predict entailment based on what they memorized?. So 'latent reasoning' might be real *and* shallow at once: minimal training reliably unlocks the form of reasoning the model saw in pretraining, but doesn't conjure logic that was never in the distribution.

There's also a limit on how far 'minimal' can go. Non-reasoning models can't simply be handed more inference compute to close the gap with reasoning models — the training regime instills a *protocol* that makes extra tokens productive in the first place Can non-reasoning models catch up with more compute?. That sharpens the question: minimal training may unlock latent capability, but apparently *some* structural training is still doing real work that prompting alone can't replace.

The most provocative adjacent idea is that verbalized chain-of-thought was never the reasoning itself — just a visible byproduct. Depth-recurrent architectures, Heima, and Coconut all scale test-time reasoning by iterating in hidden state, with no spoken intermediate steps, suggesting verbalization is a training artifact rather than a requirement Can models reason without generating visible thinking tokens?. Pushed further, Quiet-STaR shows reasoning competence can emerge as a *side effect* of ordinary next-token prediction when the model is trained to generate rationales at every token Can models learn reasoning from predicting any text?, and Energy-Based Transformers reach System-2-style deliberation from unsupervised learning alone, via gradient-descent energy minimization at inference with no domain scaffolding Can energy minimization unlock reasoning without domain-specific training?. If reasoning can fall out of plain language modeling and out of inference-time optimization, the latent-capability story stops being surprising and starts looking like the default.

The thing worth carrying away: 'unlocking latent reasoning' is best read as *elicitation engineering*, not capability creation — and the open frontier is whether you can elicit reasoning the base model never imitated. Latent-thought language models hint at the next move: adding scaling dimensions beyond raw parameters, so the latent capacity itself can be grown rather than just tapped Can latent thought vectors scale language models beyond parameters?.


Sources 11 notes

Do base models already contain hidden reasoning ability?

Five independent mechanisms—RL steering, critique fine-tuning, decoding changes, SAE feature steering, and RLVR—all elicit reasoning already present in base model activations. Post-training selects rather than creates reasoning; the bottleneck is elicitation, not capability acquisition.

Can modular cognitive tools unlock reasoning without training?

Four cognitive tools implemented as sandboxed LLM calls improved GPT-4.1 on AIME2024 from 26.7% to 43.3% without any RL training. Modularity enforces operation isolation that pure prompting cannot guarantee, eliciting pre-existing reasoning capability.

Can we steer reasoning toward brevity without retraining?

Activation-Steered Compression extracts a single vector from 50 paired examples to reduce chain-of-thought length by 67% while maintaining accuracy and achieving 2.73x speedup. The method is training-free and generalizes across model sizes and domains.

Do large language models reason symbolically or semantically?

When semantic content is decoupled from reasoning tasks, LLM performance collapses even with correct rules in context. Models rely on parametric commonsense and token associations rather than formal logical manipulation, constraining reasoning to training distribution semantics.

Does chain-of-thought reasoning reveal genuine inference or pattern matching?

CoT works by constraining models to reproduce familiar reasoning patterns from training, not by enabling novel symbolic reasoning. Performance degrades predictably under distribution shifts—the signature of imitation rather than capability emergence.

Do LLMs predict entailment based on what they memorized?

McKenna et al. (2023) identified attestation bias: LLMs predict entailment based on whether the hypothesis appears in training data, not whether the premise actually supports it. Random premise experiments show models maintain high entailment predictions when hypotheses are attested, proving they respond to memorized propositions rather than premise-hypothesis relationships.

Can non-reasoning models catch up with more compute?

Reasoning models persistently outperform non-reasoning models regardless of inference budget because training instills a reasoning protocol that makes additional tokens productive. The gap is fundamentally about deployment mechanisms and training structure, not raw capability.

Can models reason without generating visible thinking tokens?

Multiple architectures—depth-recurrent models, Heima, and Coconut—demonstrate that test-time compute scales through hidden state iteration rather than token generation. This suggests verbalization is a training artifact, not a reasoning requirement.

Can models learn reasoning from predicting any text?

Quiet-STaR trains language models to generate rationales at every token position during pretraining on arbitrary internet text, enabling general reasoning without task-specific datasets. Rationale quality is judged by predictive accuracy rather than labeled correctness, allowing reasoning competence to emerge as a side effect of improved language modeling.

Can energy minimization unlock reasoning without domain-specific training?

Energy-Based Transformers assign energy values to input-prediction pairs and use gradient descent minimization for inference, yielding 35% higher training scaling rates and 29% more inference-compute gains than Transformer++, while generalizing better on out-of-distribution data without domain-specific scaffolding.

Can latent thought vectors scale language models beyond parameters?

Latent-Thought Language Models achieve superior sample and parameter efficiency by coupling fast local variational learning with slow global decoder learning. This dual-rate scheme scales few-shot reasoning across both model and latent size, creating independent scaling dimensions beyond traditional parameter scaling.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst tasked with re-evaluating whether base models contain latent reasoning that minimal training unlocks—a question that may have shifted as models and methods have evolved. A curated library (2023–2025) made these claims, now dated:

**What a curated library found — and when (dated claims, not current truth):**
- Five different post-training interventions (RL steering, critique fine-tuning, decoding tweaks, sparse-autoencoder steering, RLVR) surface the same latent reasoning already in base-model activations, suggesting post-training *selects* rather than *creates* (~2025).
- Cognitive tools (modular sandboxed LLM calls, zero RL) lifted GPT-4.1 math from 27% → 43% by isolating operations the model could already perform (~2025).
- Reasoning verbosity is a single linear direction in activation space, steer-able via 50 examples with no retraining (~2025).
- Chain-of-thought reproduces learned *schemata* from training and degrades under distribution shift—signature of imitation, not emergent inference (~2025).
- Depth-recurrent and latent-thought models scale test-time reasoning in hidden state without verbalized steps; reasoning may emerge as a side effect of next-token prediction, not require explicit scaffolding (~2025).

**Anchor papers (verify; mind their dates):**
- arXiv:2305.14825 (2023): In-Context Semantic vs. Symbolic Reasoning
- arXiv:2403.09629 (2024): Quiet-STaR—Rationale Generation at Token Level
- arXiv:2502.05171 (2025): Latent Reasoning via Recurrent Depth
- arXiv:2507.02092 (2025): Energy-Based Transformers and Unsupervised System-2 Thinking

**Your task:**
(1) **RE-TEST EACH CONSTRAINT.** For each claim above, judge whether newer model scales, improved fine-tuning methods, or advances in test-time compute (e.g., longer rollouts, better verifiers, multi-agent orchestration) have since *relaxed* the "minimal training" boundary or *overturned* the "selection vs. creation" framing. Is the bottleneck still elicitation, or can sufficiently structured post-training *add* reasoning capability beyond what pretraining imitated? Separate durable questions from perishable limitations.

(2) **Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months.** Locate papers showing either (a) base models do *not* contain the latent reasoning claimed, or (b) post-training does *install* new capability, not merely unlock it. Flag disagreement on what constitutes "reasoning" itself.

(3) **Propose 2 research questions that ASSUME the regime may have moved:** e.g., "Can reasoning capability be grown—not just unlocked—by scaling latent dimensions?" or "Does test-time compute eventually substitute for training-time structure, or are they orthogonal?"

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines