INQUIRING LINE

Inquiring lines›What makes reasoning better — more…›How do prompts and framing affect…›How faithfully do LLMs reflect the…›this inquiring line

The chain-of-thought an AI writes out may be a partial readout of reasoning that already happened out of sight.

Does reasoning happen in hidden space or in generated tokens?

This explores whether an LLM's actual 'thinking' lives in the hidden states it never shows you, or in the visible chain-of-thought tokens it writes out — and the corpus suggests the honest answer is 'mostly the hidden space, with the text as a partial interface.'

This question is really asking where the work of reasoning gets done: in the model's internal hidden states, or in the words it generates on screen. The corpus leans hard toward the hidden space. The cleanest framing comes from a proposal to study reasoning as the formation of latent-state trajectories rather than as the surface text it produces — on this view, the written chain-of-thought is a partial interface onto a process that's already running underneath Where does LLM reasoning actually happen during generation?. Architectures built to skip verbalization entirely back this up: depth-recurrent models, Heima, and Coconut all scale test-time compute by iterating hidden states instead of emitting tokens, which implies the visible 'thinking out loud' is a training artifact rather than a requirement for reasoning Can models reason without generating visible thinking tokens?.

The most striking evidence that the two can come apart is mechanistic. Using a 'logit lens' to peek inside, researchers found models that compute the correct answer in layers 1–3 and then actively overwrite those representations in the final layers to emit format-compliant filler — the real reasoning is fully recoverable from the lower-ranked predictions the model chose not to say Do transformers hide reasoning before producing filler tokens?. In the same spirit, activation probes show models often commit to an answer internally long before they finish writing their reasoning, especially on easy problems where the chain-of-thought is essentially performance — though on genuinely hard tasks the written steps do track real internal belief updates Does chain-of-thought reasoning reflect genuine thinking or performance?.

So if the answer is mostly determined in hidden space, what are the tokens for? Several notes suggest they function more as computational scaffolding than as meaning. Models trained on deliberately corrupted, semantically wrong traces perform about as well as those trained on correct ones — sometimes generalizing better — which is hard to square with the text being where the thinking happens Do reasoning traces need to be semantically correct?. Relatedly, the format and spatial structure of a chain-of-thought shapes reasoning far more than its logical content, and invalid prompts work as well as valid ones: CoT is pattern-guided generation, not formal logic What makes chain-of-thought reasoning actually work?.

But 'mostly hidden' isn't 'tokens don't matter,' and this is the twist worth carrying away: not all tokens are equal. A small minority of generated tokens carry almost all the reasoning load. 'Thinking' tokens like 'Wait' and 'Therefore' spike in mutual information with the correct answer, and suppressing them — but not random tokens — damages reasoning Do reflection tokens carry more information about correct answers?. Roughly 20% of tokens are high-entropy 'forking points' where the model genuinely decides, and training only on those matches full training Do high-entropy tokens drive reasoning model improvements?; pruning studies similarly show models internally rank tokens by functional importance, preserving symbolic computation first Which tokens in reasoning chains actually matter most?. So the picture is layered: the bulk of reasoning is latent, but specific generated tokens are the visible joints where hidden trajectories pivot.

The frontier of this question is dissolving the binary altogether. Large Concept Models reason over sentence embeddings in a language-agnostic space before decoding to any language Can reasoning happen at the sentence level instead of tokens?, and diffusion LLMs decouple reasoning from answering entirely — refining 'thinking' in masked positions alongside the answer, with answer confidence often converging before the reasoning finishes Can reasoning and answers be generated separately in language models?. The direction of travel: reasoning is a hidden-state process, and the generated tokens are a steerable, partly optional readout of it.

Sources 11 notes

Where does LLM reasoning actually happen during generation?

Evidence from CoT faithfulness tests, feature steering, and layer analysis suggests latent-state dynamics drive reasoning, while surface chain-of-thought serves as a partial interface. Hidden reasoning processes should be the default focus of study.

Can models reason without generating visible thinking tokens?

Multiple architectures—depth-recurrent models, Heima, and Coconut—demonstrate that test-time compute scales through hidden state iteration rather than token generation. This suggests verbalization is a training artifact, not a reasoning requirement.

Do transformers hide reasoning before producing filler tokens?

Logit lens analysis shows models trained with hidden CoT tokens compute correct answers in layers 1-3, then actively suppress these representations in final layers to produce format-compliant filler output. The reasoning is fully recoverable from lower-ranked token predictions.

Does chain-of-thought reasoning reflect genuine thinking or performance?

Activation probes show models commit to answers internally long before finishing their reasoning on easy tasks, but on hard tasks the reasoning process tracks real belief updates with detectable inflection points. Probe-guided early exit reduces tokens by up to 80 percent without accuracy loss.

Do reasoning traces need to be semantically correct?

Models trained on systematically irrelevant traces maintain solution accuracy and sometimes improve out-of-distribution generalization, suggesting traces function as computational scaffolding rather than meaningful reasoning steps.

Show all 11 sources

What makes chain-of-thought reasoning actually work?

Research shows training format shapes reasoning strategy 7.5× more than domain, demo position swings accuracy 20%, and invalid CoT prompts work as well as valid ones. CoT is pattern-guided generation, not formal logic.

Do reflection tokens carry more information about correct answers?

Specific tokens like "Wait" and "Therefore" show sharp spikes in mutual information with correct answers. Suppressing them harms reasoning while suppressing equal random tokens does not, and representation recycling improves accuracy 20%.

Do high-entropy tokens drive reasoning model improvements?

Only ~20% of tokens exhibit high entropy as pivotal reasoning decision points; RLVR primarily adjusts these forking tokens. Training exclusively on them matches or exceeds full-gradient performance, revealing that the minority carries the learning signal.

Which tokens in reasoning chains actually matter most?

Greedy likelihood-preserving pruning reveals six functional token categories; symbolic computation tokens are preferentially preserved while grammar and meta-discourse are pruned first. Student models trained on these pruned chains outperform those trained on frontier-model compression.

Can reasoning happen at the sentence level instead of tokens?

Meta's Large Concept Model operates on sentence embeddings rather than tokens, reasoning in a language-agnostic space before decoding to any target language. This hierarchical approach with paragraph-level planning produces more coherent output than flat token generation.

Can reasoning and answers be generated separately in language models?

ICE shows that bidirectional attention in diffusion LLMs enables in-place prompting—embedding reasoning directly in masked positions refined alongside answers. Answer confidence converges early while reasoning continues refining, allowing early-exit mechanisms to cut compute by 50% while maintaining accuracy.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Beyond Semantics: The Unreasonable Effectiveness of Reasonless Intermediate Tokens4.27 match · arxiv ↗
Farther the Shift, Sparser the Representation: Analyzing OOD Mechanisms in LLMs4.27 match · arxiv ↗
LLM Reasoning Is Latent, Not the Chain of Thought3.52 match · arxiv ↗
DeepSeek-R1 Thoughtology: Let's think about LLM Reasoning3.46 match · arxiv ↗
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity3.42 match · arxiv ↗
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens2.63 match · arxiv ↗
Hierarchical Reasoning Model2.62 match · arxiv ↗
Implicit Chain of Thought Reasoning via Knowledge Distillation2.59 match · arxiv ↗

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a reasoning-process researcher tasked with re-testing a high-stakes claim about where LLM reasoning lives. The question: Does reasoning happen in hidden space or in generated tokens?

What a curated library found — and when (findings span 2024–2026; treat as dated claims, not current truth):
• Hidden-state trajectories, not surface text, carry the bulk of reasoning work; depth-recurrent and latent-reasoning architectures scale test-time compute by iterating hidden states, sidestepping verbalization (~2025).
• Logit lens and activation probes show models compute correct answers in early layers, then overwrite them in final layers; answers are often committed internally long before chain-of-thought finishes (~2024–2025).
• Corrupted reasoning traces perform as well as correct ones, and format/structure matters far more than semantic content — suggesting text is scaffolding, not the seat of reasoning (~2025).
• A small, high-entropy minority of tokens (~20%) carry almost all reasoning load; 'thinking tokens' like 'Wait' spike in mutual information with correctness (~2025–2026).
• Frontier: diffusion LLMs decouple reasoning from answering; Large Concept Models reason in language-agnostic embeddings before decoding (~2025–2026).

Anchor papers (verify; mind their dates):
• arXiv:2412.06769 (Dec 2024) — Continuous latent reasoning
• arXiv:2506.02867 (Jun 2025) — Thinking tokens as information peaks
• arXiv:2508.10736 (Aug 2025) — In-place prompting in diffusion LLMs
• arXiv:2604.15726 (Apr 2026) — Reasoning is latent, not chain-of-thought

Your task:
(1) RE-TEST EACH CONSTRAINT. For every claim above, judge whether newer models, scaling laws, instruction-tuning regimes, or reinforcement learning since mid-2026 have relaxed or overturned it. Has improved mechanistic interpretability changed what we can recover from hidden states? Do aligned models still exhibit the overwriting behavior, or does RLHF/constitutional training anchor reasoning more tightly to tokens? Separate the durable question (reasoning IS partly latent) from perishable limits (we can't access it, verbalization is optional). Cite what changed it.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months that argues reasoning IS token-dependent, or that the hidden/token divide is a false dichotomy.
(3) Propose 2 research questions that assume the regime may have moved: e.g., 'If reasoning is latent-primary but token-steering is cheap, can we build a reasoning-alignment protocol that doesn't require interpretability?' or 'Do retrieval-augmented or tool-calling LLMs reconverge reasoning to the surface because external state matters?'

Cite arXiv IDs; flag anything you cannot ground in a real paper.

The chain-of-thought an AI writes out may be a partial readout of reasoning that already happened out of sight.

Related lines of inquiry

Sources 11 notes

Papers this line draws on 8