SYNTHESIS NOTE

Do reasoning traces actually show how models think?

Explores whether the step-by-step narrative in reasoning traces reflects the actual computational dependencies inside the model, or whether traces are stylistic constructs that only resemble reasoning.

Synthesis note · 2026-06-27 · sourced from Reasoning Critiques

Reasoning traces from large reasoning models are non-linear — they backtrack, self-correct, verify locally — and that non-linearity breaks the two things we most want to do with them: evaluate correctness and monitor faithfulness. Stepwise scoring can flag a step as erroneous even when a later self-verification overrides it and the trace as a whole is correct. ReasoningFlow's response is to annotate traces as fine-grained directed acyclic graphs of discourse relations, so that the structure of the reasoning — not just its surface tokens — becomes an analyzable object.

Two findings matter most, and they cut against the naive reading of traces. First, most erroneous steps are not used to derive the final answer, which means a trace can be locally wrong everywhere that doesn't matter and still land correctly — penalizing those steps is mis-targeted supervision. Second, and more unsettling: the mechanistic causal dependencies between steps (what actually influences what, internally) do not reflect the language-level discourse structure (what the trace says depends on what). The narrative the trace tells about its own reasoning is not the computation that produced the answer.

That gap is the whole ballgame for interpretability. It extends Do reasoning traces actually cause correct answers?: ReasoningFlow gives quantitative teeth to the worry that traces are stylistic surface rather than verified computation, by directly measuring the mismatch between discourse and mechanism. It also explains why Why do standard process reward models fail on thinking traces? — a PRM that scores the linear surface will misread a structure that is actually a branching DAG with dead-end errors.

The caution worth holding: ReasoningFlow also reports that LRMs from different base models and post-training data produce structurally similar traces, which could mean the discourse structure is a genuine, transferable property of reasoning — or that it is a shared stylistic convention learned from overlapping trace corpora. If the latter, the discourse-mechanism gap is less "the model hides its reasoning" and more "the trace is a genre, not a log." Either way, monitoring tools that read the discourse and assume it is the computation are reading the wrong object.

Inquiring lines that use this note as a source 5

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
14 direct connections · 114 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

discourse-level reasoning graphs diverge from mechanistic dependencies — the language of a reasoning trace is not its computation