SYNTHESIS NOTE

Do reasoning traces actually show how models think?

Explores whether the step-by-step narrative in reasoning traces reflects the actual computational dependencies inside the model, or whether traces are stylistic constructs that only resemble reasoning.

Synthesis note · 2026-06-27 · sourced from Reasoning Critiques

Reasoning traces from large reasoning models are non-linear — they backtrack, self-correct, verify locally — and that non-linearity breaks the two things we most want to do with them: evaluate correctness and monitor faithfulness. Stepwise scoring can flag a step as erroneous even when a later self-verification overrides it and the trace as a whole is correct. ReasoningFlow's response is to annotate traces as fine-grained directed acyclic graphs of discourse relations, so that the structure of the reasoning — not just its surface tokens — becomes an analyzable object.

Two findings matter most, and they cut against the naive reading of traces. First, most erroneous steps are not used to derive the final answer, which means a trace can be locally wrong everywhere that doesn't matter and still land correctly — penalizing those steps is mis-targeted supervision. Second, and more unsettling: the mechanistic causal dependencies between steps (what actually influences what, internally) do not reflect the language-level discourse structure (what the trace says depends on what). The narrative the trace tells about its own reasoning is not the computation that produced the answer.

That gap is the whole ballgame for interpretability. It extends Do reasoning traces actually cause correct answers?: ReasoningFlow gives quantitative teeth to the worry that traces are stylistic surface rather than verified computation, by directly measuring the mismatch between discourse and mechanism. It also explains why Why do standard process reward models fail on thinking traces? — a PRM that scores the linear surface will misread a structure that is actually a branching DAG with dead-end errors.

The caution worth holding: ReasoningFlow also reports that LRMs from different base models and post-training data produce structurally similar traces, which could mean the discourse structure is a genuine, transferable property of reasoning — or that it is a shared stylistic convention learned from overlapping trace corpora. If the latter, the discourse-mechanism gap is less "the model hides its reasoning" and more "the trace is a genre, not a log." Either way, monitoring tools that read the discourse and assume it is the computation are reading the wrong object.

Inquiring lines that use this note as a source 5

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

14 direct connections · 114 in 2-hop network ·medium cluster Open in graph ↗

Do reasoning traces actually show how models thi… Do reasoning traces actually cause correct answers… Why do standard process reward models fail on thin… Can reasoning steps be dynamically pruned without …

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Do reasoning traces actually cause correct answers? Explores whether the intermediate 'thinking' tokens in R1-style models genuinely drive reasoning or merely mimic its appearance. Matters because false confidence in invalid traces could mask errors.
extends: ReasoningFlow quantifies the discourse-vs-computation gap that this note argues makes traces stylistic mimicry
Why do standard process reward models fail on thinking traces? Existing PRMs assume clean, sequential steps but reasoning models produce messy trajectories with branching and backtracking. Understanding this mismatch could improve how we supervise and evaluate exploratory reasoning.
grounds: the DAG structure ReasoningFlow recovers is exactly the branching that breaks linear-trace PRMs
Can reasoning steps be dynamically pruned without losing accuracy? This explores whether chain-of-thought reasoning contains redundant steps that can be identified and removed during inference. Understanding which steps matter could improve efficiency while maintaining correctness.
convergent-with: both taxonomize reasoning into step/relation types and find much of the trace does not drive the answer

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

discourse-level reasoning graphs diverge from mechanistic dependencies — the language of a reasoning trace is not its computation

Do reasoning traces actually show how models think?

Related concepts in this collection 3

Related papers in this collection 8

Search by related questions 5