SYNTHESIS NOTE
Reasoning, Retrieval, and Evaluation Training, RL, and Test-Time Scaling

Is reflection in reasoning models actually fixing mistakes?

Do the thinking steps that appear after a model's first answer represent genuine self-correction, or are they mostly confirming what the model already concluded? Understanding this matters for how we train and deploy reasoning systems.

Synthesis note · 2026-02-22 · sourced from Reasoning by Reflection
How should we allocate compute budget at inference time? How should researchers navigate LLM reasoning research?

The Hook

We've been watching reasoning models think and assuming the reflection is where the work happens. It isn't. The cognitive labor occurs before the first answer. The reflection tokens that follow are mostly the model telling us it was already right.

The Finding

First Try Matters analyzes rollouts from 8 reasoning models on 5 mathematical datasets. The result: reflections — the reasoning that occurs after a candidate answer is produced — are predominantly confirmatory. They rarely change the answer.

More counterintuitively: training on longer reflection chains doesn't improve self-correction capability. It improves first-answer quality. The model gets better at being right the first time, not at catching when it's wrong.

What This Means

The visible reflection is post-hoc. The model has already reasoned to a conclusion through the invisible pre-answer chain. The reflection loop is mostly generating confirmation that the conclusion it reached is correct. When the first answer is right, this looks like careful double-checking. When the first answer is wrong, the confirmation loop typically reinforces the error rather than catching it.

This reframes the entire reflection-training literature. We've been optimizing for training data with more reflection steps under the assumption that reflection = self-correction. The finding says: reflection ≈ confirmation. More reflection training = better first answers that need less correction, not better correction capability.

The Evidence from Efficiency

Early stopping — cutting reflection after the first plausible candidate answer appears — saves 24.5% of inference tokens with only 2.9% accuracy loss. If the reflection tokens after the first answer were doing substantive work, cutting them would cost more accuracy. They aren't.

The Connection

This joins Does self-revision actually improve reasoning in language models? in a cluster that challenges the "more reflection = better reasoning" assumption. That note says revision actively hurts. This note says revision mostly doesn't happen at all — it's confirmation theater. Together: the reflection loop is at best neutral and at worst harmful.

The architectural implication: if you want genuine self-correction, you need external critique — Does revising your own reasoning actually help or hurt?. Internal reflection with the same model on its own outputs produces confirmation, not correction.

Post Angle

Platform: Medium (~1000 words). Hook: "We've been watching models think. The thinking isn't where we think it is." Evidence: 8 models, 5 datasets, predominantly confirmatory reflections. Implication: what we're calling self-correction is actually self-confirmation; training on reflection is training better first-pass reasoning. Practical: 24.5% token efficiency win from early stopping.

Inquiring lines that use this note as a source 37

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
14 direct connections · 134 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

the first answer was right — why reflection in reasoning models is mostly theater