Do reasoning traces actually show how models think?
Explores whether the step-by-step narrative in reasoning traces reflects the actual computational dependencies inside the model, or whether traces are stylistic constructs that only resemble reasoning.
Reasoning traces from large reasoning models are non-linear — they backtrack, self-correct, verify locally — and that non-linearity breaks the two things we most want to do with them: evaluate correctness and monitor faithfulness. Stepwise scoring can flag a step as erroneous even when a later self-verification overrides it and the trace as a whole is correct. ReasoningFlow's response is to annotate traces as fine-grained directed acyclic graphs of discourse relations, so that the structure of the reasoning — not just its surface tokens — becomes an analyzable object.
Two findings matter most, and they cut against the naive reading of traces. First, most erroneous steps are not used to derive the final answer, which means a trace can be locally wrong everywhere that doesn't matter and still land correctly — penalizing those steps is mis-targeted supervision. Second, and more unsettling: the mechanistic causal dependencies between steps (what actually influences what, internally) do not reflect the language-level discourse structure (what the trace says depends on what). The narrative the trace tells about its own reasoning is not the computation that produced the answer.
That gap is the whole ballgame for interpretability. It extends Do reasoning traces actually cause correct answers?: ReasoningFlow gives quantitative teeth to the worry that traces are stylistic surface rather than verified computation, by directly measuring the mismatch between discourse and mechanism. It also explains why Why do standard process reward models fail on thinking traces? — a PRM that scores the linear surface will misread a structure that is actually a branching DAG with dead-end errors.
The caution worth holding: ReasoningFlow also reports that LRMs from different base models and post-training data produce structurally similar traces, which could mean the discourse structure is a genuine, transferable property of reasoning — or that it is a shared stylistic convention learned from overlapping trace corpora. If the latter, the discourse-mechanism gap is less "the model hides its reasoning" and more "the trace is a genre, not a log." Either way, monitoring tools that read the discourse and assume it is the computation are reading the wrong object.
Inquiring lines that use this note as a source 5
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Why do language models produce reasoning traces that mimic human reasoning style?
- Is the structure of reasoning traces learned as a shared stylistic convention?
- What makes discourse structure different from mechanistic causal structure in traces?
- How do interpretive and evaluative disagreement show up differently in agent traces?
- Can flow concentration in reasoning traces predict model quality better than tokens?
Related concepts in this collection 3
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Do reasoning traces actually cause correct answers?
Explores whether the intermediate 'thinking' tokens in R1-style models genuinely drive reasoning or merely mimic its appearance. Matters because false confidence in invalid traces could mask errors.
extends: ReasoningFlow quantifies the discourse-vs-computation gap that this note argues makes traces stylistic mimicry
-
Why do standard process reward models fail on thinking traces?
Existing PRMs assume clean, sequential steps but reasoning models produce messy trajectories with branching and backtracking. Understanding this mismatch could improve how we supervise and evaluate exploratory reasoning.
grounds: the DAG structure ReasoningFlow recovers is exactly the branching that breaks linear-trace PRMs
-
Can reasoning steps be dynamically pruned without losing accuracy?
This explores whether chain-of-thought reasoning contains redundant steps that can be identified and removed during inference. Understanding which steps matter could improve efficiency while maintaining correctness.
convergent-with: both taxonomize reasoning into step/relation types and find much of the trace does not drive the answer
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- ReasoningFlow: Discourse Structures for Understanding LLM Reasoning Traces
- The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
- Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!
- LLM Reasoning Is Latent, Not the Chain of Thought
- What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT
- Beyond Semantics: The Unreasonable Effectiveness of Reasonless Intermediate Tokens
- Thought Anchors: Which LLM Reasoning Steps Matter?
- Beyond the Last Answer: Your Reasoning Trace Uncovers More than You Think
Original note title
discourse-level reasoning graphs diverge from mechanistic dependencies — the language of a reasoning trace is not its computation