INQUIRING LINE

Should LLM reasoning be studied as latent state trajectories rather than surface text?

This explores whether the real action in LLM reasoning happens in hidden internal states rather than the words the model writes out — and whether researchers should study it that way.


This explores whether LLM reasoning should be studied as the movement of internal hidden states rather than the visible chain-of-thought text — and the corpus comes down fairly hard on "yes, mostly." The central note argues that reasoning primarily operates through hidden-state trajectories, with the surface chain-of-thought serving only as a partial, sometimes-unfaithful interface to what's actually happening inside Where does LLM reasoning actually happen during generation?. The evidence comes from chain-of-thought faithfulness tests, feature steering, and layer-by-layer analysis — all of which suggest the written reasoning can diverge from the computation that produced the answer.

What makes this more than a single claim is how other corners of the collection independently point at the same gap between surface text and underlying mechanism. Mechanistic interpretability work finds that "understanding" isn't one thing but three coexisting tiers — features as directions, factual world-state connections, and compact circuits — layered as a patchwork rather than a clean hierarchy Do language models understand in fundamentally different ways?. That's a portrait you can only see by looking at internal structure, not output. Similarly, models default to surface-level shortcuts on theory-of-mind tasks rather than genuinely tracking mental states, and forcing explicit belief-tracking architecture closes the gap — strong evidence that the surface answer hides a shallower internal process than the text implies Do large language models genuinely simulate mental states?.

The latent-state lens also reframes failure. If reasoning is a trajectory through hidden space, then failure is a trajectory that wanders. One note describes reasoning LLMs as "wandering explorers" lacking validity, effectiveness, and necessity — so success probability collapses exponentially as problems get deeper Why do reasoning LLMs fail at deeper problem solving?. That's a state-dynamics story, not a text story. And entailment work shows models keying off whether a hypothesis was memorized rather than whether the premise supports it — the surface output says "entailment," but the internal process is retrieval, not inference Do LLMs predict entailment based on what they memorized?. Likewise, when semantic content is stripped from a task, performance collapses even with correct rules in hand, suggesting the underlying machinery is semantic association, not symbolic manipulation Do large language models reason symbolically or semantically?.

But the corpus doesn't let "ignore the text" off the hook entirely, and this is the part you might not expect. Several notes show that intervening on the surface *changes* the trajectory — meaning text isn't just a readout, it's a partial control surface. Structured argument prompts (forcing models to name warrants and backing) catch reasoning failures that plain chain-of-thought lets slide Can structured argument prompts make LLM reasoning more rigorous?. And diffusion LLMs blur the line entirely: they embed reasoning directly into masked positions refined alongside the answer, so "reasoning" and "answer" stop being a clean before-and-after sequence and become parallel axes of a single refinement process Can reasoning and answers be generated separately in language models?. That hints the surface-vs-latent split is itself architecture-dependent, not a universal law.

So the sharper takeaway: studying latent trajectories isn't just a better microscope — it dissolves several puzzles that look mysterious at the text level. Why do different models show distinct strategic "personalities" tied to game type rather than raw depth Do large language models use one reasoning style or many?? Why is causal reasoning reliably stronger than temporal Why do LLMs handle causal reasoning better than temporal reasoning?? These read as quirks of output until you treat them as signatures of where the internal trajectory has been well-grooved by training and where it hasn't. The text is the shadow; the trajectory is the thing casting it — but the shadow can still be poked to move the object.


Sources 10 notes

Where does LLM reasoning actually happen during generation?

Evidence from CoT faithfulness tests, feature steering, and layer analysis suggests latent-state dynamics drive reasoning, while surface chain-of-thought serves as a partial interface. Hidden reasoning processes should be the default focus of study.

Do language models understand in fundamentally different ways?

Mechanistic interpretability reveals conceptual understanding (features as directions), state-of-world understanding (factual connections), and principled understanding (compact circuits). Crucially, higher tiers coexist with lower-tier heuristics rather than replacing them, creating a patchwork of capabilities.

Do large language models genuinely simulate mental states?

ChangeMyView and FANTOM benchmarks show LLMs fail at authentic perspective-taking in open-ended scenarios, despite succeeding on structured tasks. Hybrid Bayesian architectures that force explicit belief tracking outperform LLM-alone approaches, suggesting the gap is architectural rather than merely training-based.

Why do reasoning LLMs fail at deeper problem solving?

Current reasoning models lack the three properties of systematic exploration: validity, effectiveness, and necessity. This causes success probability to drop exponentially with problem depth, making medium problems solvable but deep problems catastrophically harder.

Do LLMs predict entailment based on what they memorized?

McKenna et al. (2023) identified attestation bias: LLMs predict entailment based on whether the hypothesis appears in training data, not whether the premise actually supports it. Random premise experiments show models maintain high entailment predictions when hypotheses are attested, proving they respond to memorized propositions rather than premise-hypothesis relationships.

Do large language models reason symbolically or semantically?

When semantic content is decoupled from reasoning tasks, LLM performance collapses even with correct rules in context. Models rely on parametric commonsense and token associations rather than formal logical manipulation, constraining reasoning to training distribution semantics.

Can structured argument prompts make LLM reasoning more rigorous?

Applying Toulmin's argument model as explicit prompting steps (CQoT) improves LLM reasoning by forcing models to identify warrants and backing rather than skipping implicit premises. The method catches failures that standard chain-of-thought prompting allows.

Can reasoning and answers be generated separately in language models?

ICE shows that bidirectional attention in diffusion LLMs enables in-place prompting—embedding reasoning directly in masked positions refined alongside answers. Answer confidence converges early while reasoning continues refining, allowing early-exit mechanisms to cut compute by 50% while maintaining accuracy.

Do large language models use one reasoning style or many?

Analysis of 22 LLMs across behavioral game theory reveals three dominant profiles: GPT-o1 uses minimax reasoning, DeepSeek-R1 uses trust-based reasoning, and GPT-o3-mini uses belief-anticipation. Performance correlates with game structure, not raw reasoning depth.

Why do LLMs handle causal reasoning better than temporal reasoning?

ChatGPT excels at causal relations but struggles with temporal ordering because causal connectives are explicit and frequent in training data, while temporal order is often implicit and must be inferred contextually.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a capabilities researcher re-testing whether LLM reasoning should be studied as latent state trajectories rather than surface chain-of-thought text. The question remains open.

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–2026 and include:
• Surface chain-of-thought text diverges from internal computation; reasoning operates primarily through hidden-state trajectories, not visible text (2026).
• Models default to shallow surface-level shortcuts (memorization, retrieval, semantic association) rather than symbolic or genuinely causal reasoning, visible only via latent analysis (2023–2025).
• Reasoning LLMs exhibit exponential failure on deeper problems because internal trajectories "wander" without systematic validity or necessity constraints (2025).
• Intervening on surface text *does* move latent trajectories — structured argumentation prompts and diffusion-based in-place reasoning blur the surface-vs-latent boundary (2024–2025).
• Strategic reasoning profiles and causal-vs-temporal asymmetries emerge as signatures of training-grooved trajectories, not output quirks (2025).

Anchor papers (verify; mind their dates):
• arXiv:2305.14825 (2023) — semantic reasoning, not symbolic
• arXiv:2402.16837 (2024) — multi-hop latent reasoning
• arXiv:2505.20296 (2025) — wandering explorer failure modes
• arXiv:2604.15726 (2026) — latent reasoning thesis

Your task:
(1) RE-TEST EACH CONSTRAINT. For each finding above, determine whether post-2026 advances in model scale, mechanistic interpretability tooling (e.g., SAE scaling, layer-wise intervention), multi-step reasoning (chain-of-verification, tree-search), or evaluation harnesses have *relaxed* the surface-latent gap or *overturned* the claim that reasoning is primarily latent. Separate the durable question ("What is the mechanistic substrate of reasoning?") from the perishable claim ("Surface text is unfaithful") — say plainly which constraints still hold and what dissolved them.
(2) Surface the strongest contradicting or superseding work from the last ~6 months that argues surface reasoning IS mechanistically faithful, or that the latent-vs-surface distinction is a false dichotomy.
(3) Propose 2 research questions that *assume* the regime has shifted: e.g., if surface and latent are now provably entangled, how should we study their coupling? If reasoning is primarily latent, can we engineer architectures where the surface becomes a true steering interface?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines