Does every correct chain-of-thought trace improve fine-tuning?
Are all answer-correct reasoning traces equally valuable for training? This explores whether some correct traces contain reasoning that actually harms model learning despite reaching the right answer.
The standard assumption behind distilling long chain-of-thought traces into a smaller model via SFT is that a trace is useful supervision once its final answer is correct. "Diagnosing Harmful Continuation in Answer-Correct Long-CoT Training Traces" (2605.29288) breaks that assumption. It identifies post-conclusion continuation: a segment where the answer is already sufficiently supported, but the trace keeps reasoning — and that tail, even though it preserves the correct answer, is harmful to train on. A delete-only editor that excises the post-conclusion suffix while keeping the answer produces measurably better SFT than training on the full trace. The authors name the empirically confirmed phenomenon harmful continuation and ship a lightweight boundary proxy, Harmful Continuation Cut (HCC), that approximates where useful reasoning ends.
The diagnostic move is what makes this distinct. The harmful tail is characterized by an uncertainty–geometry mismatch: persistent local uncertainty (the model keeps exploring as if unsettled) combined with weakened terminal-directional hidden-state progress (the exploration no longer moves the representation toward the answer). That mismatch is the signature — not length itself. A random-cut baseline that removes a length-matched suffix without identifying where reasoning concluded performs far worse (avg 29.0 vs HCC's 49.3 across MATH500/AMC23/GSM8K), proving the gain comes from cutting the right segment, not from shorter outputs.
This sits beside but does not duplicate the vault's existing trace-quality findings. It is not the faithfulness decay of Does fine-tuning disconnect reasoning steps from final answers?, nor the benchmark-vs-quality divergence of Does supervised fine-tuning improve reasoning or just answers? — both describe what fine-tuning does to a model, whereas harmful continuation is a property of the training data itself. It sharpens the correlation in Why do correct reasoning traces contain fewer tokens?: shorter-is-better holds, but the causal lever is removing post-conclusion exploration, not length per se. And it gives a data-curation counterpart to Can reasoning steps be dynamically pruned without losing accuracy? — redundancy that is steerable at inference is also deletable at training time.
Relevant Notes
- Why do correct reasoning traces contain fewer tokens? — sharpens the correlation: the causal lever is cutting post-conclusion exploration, not length
- Does supervised fine-tuning improve reasoning or just answers? — complementary failure mode: this is data-side, the trap is model-side
- Does fine-tuning disconnect reasoning steps from final answers? — another way answer-correct traces mislead SFT
- Can reasoning steps be dynamically pruned without losing accuracy? — redundancy steerable at inference is deletable at training time
Inquiring lines that use this note as a source 5
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Why are shorter reasoning traces more reliable than longer correct ones?
- What makes some reasoning traces better supervision than others despite equal accuracy?
- How does supervised fine-tuning degrade chain-of-thought faithfulness over time?
- Why do reasoning traces fail to accurately reflect model decision-making?
- How much of chain-of-thought reasoning actually diverges from the final answer?
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Diagnosing Harmful Continuation in Answer-Correct Long-CoT Training Traces
- Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!
- Beyond the Last Answer: Your Reasoning Trace Uncovers More than You Think
- Beyond Semantics: The Unreasonable Effectiveness of Reasonless Intermediate Tokens
- Local Coherence or Global Validity? Investigating RLVR Traces in Math Domains
- AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions
- What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT
- Do Cognitively Interpretable Reasoning Traces Improve LLM Performance?
Original note title
answer-correct chain-of-thought traces can still harm SFT — reasoning that continues after the answer is supported is low-value supervision and deleting it improves fine-tuning