Does the base model already contain latent reasoning capability?
This explores whether reasoning is something base models already hold in latent form (and post-training merely unlocks) versus something training builds from scratch — and how much evidence the corpus has on each side.
This explores whether reasoning is something base models already hold in latent form, or something post-training has to create. The corpus leans hard toward the first reading — and the evidence is unusually convergent. Five independent mechanisms — reinforcement learning steering, critique fine-tuning, decoding tweaks, sparse-autoencoder feature steering, and RLVR — all manage to elicit reasoning that's already sitting in base model activations, which suggests the bottleneck is *elicitation, not acquisition* Do base models already contain hidden reasoning ability?. The sharpest version of this claim is that RL post-training teaches a model *when* to reason, not *how*: hybrid models recover 91% of the performance gains just by routing tokens, and the activation vectors for reasoning strategies exist before any RL touches the weights Does RL post-training create reasoning or just deploy it?.
The most striking single piece of evidence is that you can steer one SAE-identified feature and match or beat full chain-of-thought across six model families — and this reasoning mode switches on early in generation, even overriding surface instructions Can we trigger reasoning without explicit chain-of-thought prompts?. If a single internal direction can flip reasoning on, the capability was clearly already wired in. Two more findings push the same way from different angles: latent reasoning scales test-time compute through hidden-state iteration with no verbalized steps at all, implying that writing out your thinking is a training artifact rather than a requirement Can models reason without generating visible thinking tokens?; and even deliberately corrupted reasoning traces teach as well as correct ones, suggesting traces act as computational scaffolding rather than meaningful logic Do reasoning traces need to be semantically correct?. You can also unlock gains with no weight changes whatsoever — four modular cognitive tools lifted GPT-4.1 on AIME from 26.7% to 43.3% purely by isolating operations Can modular cognitive tools unlock reasoning without training?.
Here's the twist the reader might not expect: "latent capability" doesn't mean the base model is secretly a flawless reasoner. The same corpus that says reasoning is pre-present also says it's *shallow and bounded*. When semantic content is stripped from a task, LLM performance collapses even with the correct rules in context — models lean on memorized associations, not symbolic logic Do large language models reason symbolically or semantically?. Entailment judgments track whether the hypothesis was seen in training, not whether the premise actually supports it Do LLMs predict entailment based on what they memorized?. So the latent reasoning that's being elicited is real but distribution-bound — closer to learned pattern competence than general inference.
That tension reframes the whole "reasoning cliff" debate. When reasoning models appear to collapse on hard problems, at least some of that is *execution* failure, not reasoning failure — text-only models know the algorithm but can't run it at scale, and tool-enabled versions sail past the supposed cliff Are reasoning model collapses really failures of reasoning?. Mechanistic interpretability fills in why the picture is messy: models hold understanding in hierarchical tiers, where higher-tier circuits coexist with lower-tier heuristics rather than replacing them — a patchwork, not a clean ladder Do language models understand in fundamentally different ways?.
If you want to follow where this is heading rather than where it's been, the frontier work is making that latent reasoning *richer*: GRAM swaps deterministic latent updates for stochastic sampling so a recursive reasoner can hold uncertainty and explore several solutions at once Can stochastic latent reasoning help models explore multiple solutions?, and diffusion LLMs decouple reasoning from answering entirely, refining both in place and cutting compute in half Can reasoning and answers be generated separately in language models?. The throughline across all of it: the base model already contains the seeds of reasoning — the open research question is no longer whether it's there, but how to elicit, deepen, and trust it.
Sources 12 notes
Five independent mechanisms—RL steering, critique fine-tuning, decoding changes, SAE feature steering, and RLVR—all elicit reasoning already present in base model activations. Post-training selects rather than creates reasoning; the bottleneck is elicitation, not capability acquisition.
Evidence shows base models already contain reasoning capability in latent form; RL training optimizes deployment timing rather than capability creation. Hybrid models recover 91% of performance gains by routing tokens only, and activation vectors for reasoning strategies pre-exist before any RL.
SAE-identified reasoning features can be directly steered to match or exceed chain-of-thought performance across six model families. This reasoning mode activates early in generation and overrides surface-level instructions, suggesting latent reasoning is a fundamental capability independent of explicit prompting.
Multiple architectures—depth-recurrent models, Heima, and Coconut—demonstrate that test-time compute scales through hidden state iteration rather than token generation. This suggests verbalization is a training artifact, not a reasoning requirement.
Models trained on systematically irrelevant traces maintain solution accuracy and sometimes improve out-of-distribution generalization, suggesting traces function as computational scaffolding rather than meaningful reasoning steps.
Four cognitive tools implemented as sandboxed LLM calls improved GPT-4.1 on AIME2024 from 26.7% to 43.3% without any RL training. Modularity enforces operation isolation that pure prompting cannot guarantee, eliciting pre-existing reasoning capability.
When semantic content is decoupled from reasoning tasks, LLM performance collapses even with correct rules in context. Models rely on parametric commonsense and token associations rather than formal logical manipulation, constraining reasoning to training distribution semantics.
McKenna et al. (2023) identified attestation bias: LLMs predict entailment based on whether the hypothesis appears in training data, not whether the premise actually supports it. Random premise experiments show models maintain high entailment predictions when hypotheses are attested, proving they respond to memorized propositions rather than premise-hypothesis relationships.
Models confined to text-only generation cannot execute multi-step procedures at scale, even when they know the underlying algorithm. Tool-enabled models solve problems beyond the supposed reasoning cliff, suggesting the bottleneck is procedural execution bandwidth.
Mechanistic interpretability reveals conceptual understanding (features as directions), state-of-world understanding (factual connections), and principled understanding (compact circuits). Crucially, higher tiers coexist with lower-tier heuristics rather than replacing them, creating a patchwork of capabilities.
GRAM replaces deterministic latent updates with stochastic sampling, enabling models to represent distributions over solutions rather than single predictions. This allows handling of ambiguous problems and multiple valid strategies that deterministic designs cannot represent.
ICE shows that bidirectional attention in diffusion LLMs enables in-place prompting—embedding reasoning directly in masked positions refined alongside answers. Answer confidence converges early while reasoning continues refining, allowing early-exit mechanisms to cut compute by 50% while maintaining accuracy.