INQUIRING LINE

Does the base model already contain latent reasoning capability?

This explores whether reasoning is something base models already hold in latent form (and post-training merely unlocks) versus something training builds from scratch — and how much evidence the corpus has on each side.


This explores whether reasoning is something base models already hold in latent form, or something post-training has to create. The corpus leans hard toward the first reading — and the evidence is unusually convergent. Five independent mechanisms — reinforcement learning steering, critique fine-tuning, decoding tweaks, sparse-autoencoder feature steering, and RLVR — all manage to elicit reasoning that's already sitting in base model activations, which suggests the bottleneck is *elicitation, not acquisition* Do base models already contain hidden reasoning ability?. The sharpest version of this claim is that RL post-training teaches a model *when* to reason, not *how*: hybrid models recover 91% of the performance gains just by routing tokens, and the activation vectors for reasoning strategies exist before any RL touches the weights Does RL post-training create reasoning or just deploy it?.

The most striking single piece of evidence is that you can steer one SAE-identified feature and match or beat full chain-of-thought across six model families — and this reasoning mode switches on early in generation, even overriding surface instructions Can we trigger reasoning without explicit chain-of-thought prompts?. If a single internal direction can flip reasoning on, the capability was clearly already wired in. Two more findings push the same way from different angles: latent reasoning scales test-time compute through hidden-state iteration with no verbalized steps at all, implying that writing out your thinking is a training artifact rather than a requirement Can models reason without generating visible thinking tokens?; and even deliberately corrupted reasoning traces teach as well as correct ones, suggesting traces act as computational scaffolding rather than meaningful logic Do reasoning traces need to be semantically correct?. You can also unlock gains with no weight changes whatsoever — four modular cognitive tools lifted GPT-4.1 on AIME from 26.7% to 43.3% purely by isolating operations Can modular cognitive tools unlock reasoning without training?.

Here's the twist the reader might not expect: "latent capability" doesn't mean the base model is secretly a flawless reasoner. The same corpus that says reasoning is pre-present also says it's *shallow and bounded*. When semantic content is stripped from a task, LLM performance collapses even with the correct rules in context — models lean on memorized associations, not symbolic logic Do large language models reason symbolically or semantically?. Entailment judgments track whether the hypothesis was seen in training, not whether the premise actually supports it Do LLMs predict entailment based on what they memorized?. So the latent reasoning that's being elicited is real but distribution-bound — closer to learned pattern competence than general inference.

That tension reframes the whole "reasoning cliff" debate. When reasoning models appear to collapse on hard problems, at least some of that is *execution* failure, not reasoning failure — text-only models know the algorithm but can't run it at scale, and tool-enabled versions sail past the supposed cliff Are reasoning model collapses really failures of reasoning?. Mechanistic interpretability fills in why the picture is messy: models hold understanding in hierarchical tiers, where higher-tier circuits coexist with lower-tier heuristics rather than replacing them — a patchwork, not a clean ladder Do language models understand in fundamentally different ways?.

If you want to follow where this is heading rather than where it's been, the frontier work is making that latent reasoning *richer*: GRAM swaps deterministic latent updates for stochastic sampling so a recursive reasoner can hold uncertainty and explore several solutions at once Can stochastic latent reasoning help models explore multiple solutions?, and diffusion LLMs decouple reasoning from answering entirely, refining both in place and cutting compute in half Can reasoning and answers be generated separately in language models?. The throughline across all of it: the base model already contains the seeds of reasoning — the open research question is no longer whether it's there, but how to elicit, deepen, and trust it.


Sources 12 notes

Do base models already contain hidden reasoning ability?

Five independent mechanisms—RL steering, critique fine-tuning, decoding changes, SAE feature steering, and RLVR—all elicit reasoning already present in base model activations. Post-training selects rather than creates reasoning; the bottleneck is elicitation, not capability acquisition.

Does RL post-training create reasoning or just deploy it?

Evidence shows base models already contain reasoning capability in latent form; RL training optimizes deployment timing rather than capability creation. Hybrid models recover 91% of performance gains by routing tokens only, and activation vectors for reasoning strategies pre-exist before any RL.

Can we trigger reasoning without explicit chain-of-thought prompts?

SAE-identified reasoning features can be directly steered to match or exceed chain-of-thought performance across six model families. This reasoning mode activates early in generation and overrides surface-level instructions, suggesting latent reasoning is a fundamental capability independent of explicit prompting.

Can models reason without generating visible thinking tokens?

Multiple architectures—depth-recurrent models, Heima, and Coconut—demonstrate that test-time compute scales through hidden state iteration rather than token generation. This suggests verbalization is a training artifact, not a reasoning requirement.

Do reasoning traces need to be semantically correct?

Models trained on systematically irrelevant traces maintain solution accuracy and sometimes improve out-of-distribution generalization, suggesting traces function as computational scaffolding rather than meaningful reasoning steps.

Can modular cognitive tools unlock reasoning without training?

Four cognitive tools implemented as sandboxed LLM calls improved GPT-4.1 on AIME2024 from 26.7% to 43.3% without any RL training. Modularity enforces operation isolation that pure prompting cannot guarantee, eliciting pre-existing reasoning capability.

Do large language models reason symbolically or semantically?

When semantic content is decoupled from reasoning tasks, LLM performance collapses even with correct rules in context. Models rely on parametric commonsense and token associations rather than formal logical manipulation, constraining reasoning to training distribution semantics.

Do LLMs predict entailment based on what they memorized?

McKenna et al. (2023) identified attestation bias: LLMs predict entailment based on whether the hypothesis appears in training data, not whether the premise actually supports it. Random premise experiments show models maintain high entailment predictions when hypotheses are attested, proving they respond to memorized propositions rather than premise-hypothesis relationships.

Are reasoning model collapses really failures of reasoning?

Models confined to text-only generation cannot execute multi-step procedures at scale, even when they know the underlying algorithm. Tool-enabled models solve problems beyond the supposed reasoning cliff, suggesting the bottleneck is procedural execution bandwidth.

Do language models understand in fundamentally different ways?

Mechanistic interpretability reveals conceptual understanding (features as directions), state-of-world understanding (factual connections), and principled understanding (compact circuits). Crucially, higher tiers coexist with lower-tier heuristics rather than replacing them, creating a patchwork of capabilities.

Can stochastic latent reasoning help models explore multiple solutions?

GRAM replaces deterministic latent updates with stochastic sampling, enabling models to represent distributions over solutions rather than single predictions. This allows handling of ambiguous problems and multiple valid strategies that deterministic designs cannot represent.

Can reasoning and answers be generated separately in language models?

ICE shows that bidirectional attention in diffusion LLMs enables in-place prompting—embedding reasoning directly in masked positions refined alongside answers. Answer confidence converges early while reasoning continues refining, allowing early-exit mechanisms to cut compute by 50% while maintaining accuracy.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a mechanistic reasoning researcher. The question remains live: Does the base model already contain latent reasoning capability, or is reasoning fundamentally a post-training creation?

What a curated library found — and when (dated claims, not current truth): Papers spanning 2023–2026 converge on latent pre-existence:
• Five independent mechanisms (RL steering, critique fine-tuning, decoding, SAE feature steering, RLVR) all elicit reasoning from base activations, suggesting elicitation bottleneck, not acquisition (2025–2026).
• A single SAE-identified feature steered to match or beat chain-of-thought across six model families; reasoning mode switches early, overriding surface instructions (2025).
• Latent reasoning scales test-time compute via hidden-state iteration with no verbalized steps; traces are computational scaffolding, not meaningful logic (2025–2026).
• BUT: When semantic content is stripped, performance collapses; models lean on memorized associations, not symbolic logic; entailment judgments track hypothesis attestation, not logical entailment (2023–2024).
• Reasoning failures are often *execution* failures, not capability failures; tool-enabled versions bypass supposed cliffs (2026).

Anchor papers (verify; mind their dates):
• arXiv:2305.14825 (2023) — semantic vs. symbolic reasoning boundary.
• arXiv:2502.05171 (2025) — latent reasoning and test-time compute scaling.
• arXiv:2601.08058 (2026) — latent computational mode distinct from chain-of-thought.
• arXiv:2605.19376 (2026) — generative recursive reasoning, frontier work.

Your task:
(1) RE-TEST EACH CONSTRAINT. For every finding above, judge whether newer models (post-2026), methods (distillation, curriculum, hybrid architectures), tooling (SAE scaling, mechanistic probes), or orchestration (multi-agent, memory hierarchies) have since relaxed or overturned it. Separate the durable question (likely: what *depth* and *generality* of reasoning is latent?) from perishable limitations (e.g., "single features suffice" — does scaling break this?); cite what resolved it.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months. Does any recent paper argue reasoning is *not* latent, or that post-training fundamentally *rewires* rather than elicits?
(3) Propose 2 research questions that ASSUME the regime may have moved: e.g., if latent reasoning is real but shallow, how do you stack hierarchies? If execution is the bottleneck, what architectures unlock it?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines