INQUIRING LINE

Does latent reasoning capability exist in base models before any training?

This explores whether reasoning ability is already sitting inside a base model's weights before any RL or instruction tuning — and what 'already there' actually means once you look at how the corpus probes it.


This explores whether reasoning ability is already present in a base model before any training, and the corpus answers with a fairly strong yes — but with an important twist about what kind of reasoning is sitting there. The clearest line of evidence is that five independent techniques — RL steering, critique fine-tuning, decoding tricks, sparse-autoencoder feature steering, and RLVR — all manage to draw out reasoning that's already detectable in base-model activations Do base models already contain hidden reasoning ability?. The takeaway is that post-training mostly *selects* a capability rather than *creating* it; the bottleneck is elicitation, not acquisition. A companion finding sharpens this into a slogan: RL teaches a model *when* to reason, not *how*. Hybrid setups recover ~91% of the gains just by routing which tokens get the reasoning treatment, and the activation directions for reasoning strategies exist before any RL touches the model Does RL post-training create reasoning or just deploy it?.

If reasoning is latent, you'd expect you could surface it without training at all — and you can. Modular 'cognitive tools' implemented as sandboxed model calls lifted GPT-4.1 on a hard math benchmark from 26.7% to 43.3% with zero RL, just by enforcing isolation between reasoning operations that plain prompting can't guarantee Can modular cognitive tools unlock reasoning without training?. In the same spirit, latent-space reasoning architectures scale test-time compute by iterating on hidden states rather than emitting visible thinking tokens, which suggests the verbalized chain-of-thought we see is partly a training artifact layered on top of computation that doesn't need words Can models reason without generating visible thinking tokens?.

Where does the latent capability come from in the first place? Pretraining itself plants it. Quiet-STaR shows reasoning competence can emerge as a side effect of better next-token prediction on arbitrary internet text — no task-specific reasoning dataset required Can models learn reasoning from predicting any text?. And an analysis of five million pretraining documents found that reasoning leans on broad, transferable *procedural* knowledge spread across many sources, distinct from factual recall which depends on narrow memorization of specific documents Does procedural knowledge drive reasoning more than factual retrieval?. So the raw material for reasoning is being accumulated during pretraining, which is why minimal post-training can later 'unlock' it.

But here's the part you didn't know you wanted to know: several notes push back on how much of this 'latent reasoning' is genuine inference versus learned imitation. When semantic content is stripped from a task, LLM performance collapses even with the correct rules handed to them in context — they're reasoning by semantic association, not symbolic logic Do large language models reason symbolically or semantically?. Chain-of-thought turns out to be constrained reproduction of familiar reasoning *forms* from training, and it degrades predictably the moment you shift task, length, or format — the signature of imitation, not a portable capability Does chain-of-thought reasoning reveal genuine inference or pattern matching? Does chain-of-thought reasoning actually generalize beyond training data?. Even more unsettling: models trained on deliberately corrupted, irrelevant reasoning traces do about as well as those trained on correct ones, implying the traces work as computational scaffolding rather than meaningful logic Do reasoning traces need to be semantically correct?.

Put together, the corpus lands on a nuanced position: yes, a base model carries latent reasoning machinery before any training, and that machinery is real enough that diverse, lightweight interventions can elicit it. But what's latent is closer to a vast store of procedural patterns bounded by the training distribution than a free-standing logical engine — which is also why a counter-thread argues training regime still matters more than raw inference compute, since non-reasoning models can't simply 'think longer' to close the gap Can non-reasoning models catch up with more compute?. The capability is there; whether it's reasoning or a very good impression of it is the question the corpus genuinely disagrees on.


Sources 11 notes

Do base models already contain hidden reasoning ability?

Five independent mechanisms—RL steering, critique fine-tuning, decoding changes, SAE feature steering, and RLVR—all elicit reasoning already present in base model activations. Post-training selects rather than creates reasoning; the bottleneck is elicitation, not capability acquisition.

Does RL post-training create reasoning or just deploy it?

Evidence shows base models already contain reasoning capability in latent form; RL training optimizes deployment timing rather than capability creation. Hybrid models recover 91% of performance gains by routing tokens only, and activation vectors for reasoning strategies pre-exist before any RL.

Can modular cognitive tools unlock reasoning without training?

Four cognitive tools implemented as sandboxed LLM calls improved GPT-4.1 on AIME2024 from 26.7% to 43.3% without any RL training. Modularity enforces operation isolation that pure prompting cannot guarantee, eliciting pre-existing reasoning capability.

Can models reason without generating visible thinking tokens?

Multiple architectures—depth-recurrent models, Heima, and Coconut—demonstrate that test-time compute scales through hidden state iteration rather than token generation. This suggests verbalization is a training artifact, not a reasoning requirement.

Can models learn reasoning from predicting any text?

Quiet-STaR trains language models to generate rationales at every token position during pretraining on arbitrary internet text, enabling general reasoning without task-specific datasets. Rationale quality is judged by predictive accuracy rather than labeled correctness, allowing reasoning competence to emerge as a side effect of improved language modeling.

Does procedural knowledge drive reasoning more than factual retrieval?

Analysis of 5 million pretraining documents shows reasoning relies on broad, transferable procedural knowledge from diverse sources, unlike factual recall which depends on narrow, document-specific memorization of target facts.

Do large language models reason symbolically or semantically?

When semantic content is decoupled from reasoning tasks, LLM performance collapses even with correct rules in context. Models rely on parametric commonsense and token associations rather than formal logical manipulation, constraining reasoning to training distribution semantics.

Does chain-of-thought reasoning reveal genuine inference or pattern matching?

CoT works by constraining models to reproduce familiar reasoning patterns from training, not by enabling novel symbolic reasoning. Performance degrades predictably under distribution shifts—the signature of imitation rather than capability emergence.

Does chain-of-thought reasoning actually generalize beyond training data?

DataAlchemy experiments show CoT fails systematically under distributional shifts in task, length, and format. Models produce fluent but logically inconsistent reasoning — imitating reasoning form without valid underlying logic.

Do reasoning traces need to be semantically correct?

Models trained on systematically irrelevant traces maintain solution accuracy and sometimes improve out-of-distribution generalization, suggesting traces function as computational scaffolding rather than meaningful reasoning steps.

Can non-reasoning models catch up with more compute?

Reasoning models persistently outperform non-reasoning models regardless of inference budget because training instills a reasoning protocol that makes additional tokens productive. The gap is fundamentally about deployment mechanisms and training structure, not raw capability.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst tasked with stress-testing a dated claim about latent reasoning in base LLMs. The question remains open: Does reasoning capability exist in base models before any training?

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–2026; treat all as perishable:
• Five independent techniques (RL steering, critique fine-tuning, decoding tricks, sparse-autoencoder steering, RLVR) surface reasoning already present in base-model activations; post-training *selects* rather than *creates* (2025–2026).
• RL teaches models *when* to reason, not *how*; routing alone recovers ~91% of post-training gains without weight updates (2025–2026).
• Reasoning emerges as a side effect of next-token prediction on arbitrary internet text during pretraining, not from task-specific datasets (Quiet-STaR, 2024).
• But: LLM 'reasoning' collapses when semantics are stripped; models perform equally well on corrupted vs. correct reasoning traces, suggesting imitation of familiar forms, not genuine logic (2025–2026).
• Chain-of-thought is distribution-bounded; performance degrades predictably with task/format shifts — signature of imitation, not portable inference (2025–2026).

Anchor papers (verify; mind their dates):
• arXiv:2305.14825 (2023) — In-Context Semantic Reasoners rather than Symbolic Reasoners
• arXiv:2404.03629 (2024) — Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
• arXiv:2506.02878 (2025) — CoT is Not True Reasoning, It Is Just a Tight Constraint to Imitate
• arXiv:2605.19376 (2026) — Generative Recursive Reasoning

Your task:
(1) RE-TEST EACH CONSTRAINT. For the claim that latent reasoning is real but imitative, determine: Have newer models (post-2026 checkpoints), architectural changes (sparse MoE, hierarchical hidden states), or test-time scaling (scaling laws on longer inference chains) RELAXED the imitation bottleneck? Can semantic-stripped reasoning now survive? Has any paper shown genuine symbolic reasoning without distributional reliance? State plainly: Is the constraint still airtight, or has a specific technique pierced it?
(2) Surface the strongest CONTRADICTING work from the last ~6 months. Look for papers claiming base-model reasoning is either (a) far more robust than the library suggests, or (b) far more illusory.
(3) Propose 2 research questions that assume the regime has shifted: (A) If latent reasoning is procedural imitation, what architectural or training change would enable genuine compositional reasoning? (B) If newer scaling has overcome distribution-boundedness, what is the minimum intervention to unlock it?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines