SYNTHESIS NOTE
Reasoning, Retrieval, and Evaluation Psychology, Society, and Alignment Training, RL, and Test-Time Scaling

Do chain-of-thought traces actually help users understand model reasoning?

Chain-of-thought explanations are often presented as transparency tools, but do they genuinely improve human understanding or create an illusion of interpretability? A human-subject study tests whether traces help users follow and evaluate model reasoning.

Synthesis note · 2026-02-22 · sourced from Reasoning Critiques
How should we allocate compute budget at inference time? What kind of thing is an LLM really?

A common assumption behind CoT traces: they serve as explanations. The model shows its work, users can follow the reasoning, trust is established. This assumption turns out to be wrong in a specific and quantifiable way.

Empirical findings from a 100-participant human-subject study:

The traces that are most useful for the model to generate correct answers are least useful for humans trying to understand those answers. The two objectives pull in opposite directions.

The mechanism: CoT traces used for SFT are optimized to be a training signal — to push the model toward correct token sequences through backpropagation. The properties that make a trace useful for training (complex recursive structure, non-linear exploration, self-doubt and revision cycles) are exactly the properties that make it cognitively opaque to humans.

This has a design implication that some systems are already acting on: GPT-OSS models generate a CoT trace (for model performance), a summary (for human communication), and a final answer. The trace is not shown to users. This separation acknowledges the decoupling.

The implication for AI transparency: showing users CoT traces is not showing them how the model reasons. It is showing them the model's training scaffold. What users need is a summary; what models need is the trace. Conflating the two in the name of "explainability" produces outputs that feel transparent without providing genuine interpretability.

This is a distinct claim from Do reasoning traces actually cause correct answers? — that note warns against inferring intentional reasoning from traces. This note adds: even if you don't anthropomorphize, the traces are the wrong artifact for human interpretability. Both wrong in different ways.

Controlled user-study evidence: traces don't just fail to help — they actively mislead. The interpretability-rating gap documented above measures how understandable traces feel; a between-subject user study ("Evaluating the False Trust Engendered by LLM Explanations") measures whether they improve judgment, and finds the stronger result. Showing users reasoning traces or post-hoc explanations raises their acceptance of the model's answer regardless of whether the answer is correct — the explanations are persuasive but not informative. This sharpens the decoupling claim from "traces serve the model not the user" to "traces given to the user degrade their ability to detect errors." The only explanation format that restored discrimination in that study was a contrastive dual explanation arguing both sides (see Do explanations actually help users spot AI mistakes?) — i.e., the fix is not a better one-sided trace but an artifact that argues against the model's own output.


Source (enrichment): Flaws — "Evaluating the False Trust Engendered by LLM Explanations", https://arxiv.org/abs/2605.10930

Inquiring lines that use this note as a source 4

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 7

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
18 direct connections · 148 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

cot traces optimize model performance, not user interpretability — the two objectives are decoupled