Why does instruction tuning hurt knowledge-intensive tasks more than reasoning tasks?
This explores why fine-tuning a model on instructions tends to erode its stored factual knowledge while leaving (or even sharpening) its ability to follow reasoning patterns — and the corpus suggests the answer is about what instruction tuning actually changes inside the model.
This explores why instruction tuning hits knowledge-intensive tasks harder than reasoning tasks. The cleanest clue in the corpus is that instruction tuning mostly teaches a model *how to format an answer*, not new content. In one striking result, models trained on semantically empty or even deliberately wrong instructions performed about as well as models trained on correct ones — 43% versus a 42.6% baseline — because what actually transferred was knowledge of the output space, not understanding of the task Does instruction tuning teach task understanding or output format?. If tuning is largely reshaping the *shape* of responses, then it pulls the model's weights toward producing a certain style and structure, and that reshaping is exactly what can overwrite the delicate, distributed traces where factual recall lives.
Reasoning, by contrast, seems to be more about reproducing *form* — and form is what instruction tuning reinforces. Chain-of-thought, on close inspection, works by constraining models to replay familiar reasoning templates learned in training rather than performing fresh symbolic inference Does chain-of-thought reasoning reveal genuine inference or pattern matching?. A related finding shows reasoning success depends on whether the model has seen similar *instances* before, not on the abstract difficulty of the task Do language models fail at reasoning due to complexity or novelty?. So a reasoning task is essentially a procedure to imitate — and tuning toward output structure happens to strengthen exactly that imitative scaffolding. Knowledge tasks have no such scaffolding to lean on; they need a specific fact retrieved intact, and there is nothing about format-shaping that protects that fact.
The forgetting angle makes this concrete. Continuous-reasoning work explicitly frames the danger: when you fine-tune the backbone, you risk catastrophic forgetting of pre-trained knowledge, which is why one approach freezes the main model entirely and delegates new behavior to a small auxiliary module — preserving stored knowledge while still gaining reasoning ability Can continuous reasoning avoid forgetting in instruction-tuned models?. That architectural choice only makes sense if updating the weights is what damages knowledge. The same theme shows up in faithfulness studies: after fine-tuning, a model's reasoning chains less reliably drive its final answer — the reasoning becomes performative rather than functional Does fine-tuning disconnect reasoning steps from final answers?. In other words, fine-tuning trades substance for surface, and knowledge is the substance with the most to lose.
There's a useful twist hiding here. Scaling reasoning through SFT and RL doesn't just spare reasoning — it can actively *cost* something else: longer chains of thought dilute the model's attention to the original instruction, degrading instruction-following Why do better reasoning models ignore instructions?. So the picture isn't 'tuning is bad,' it's that different capabilities sit in different places. Format and procedure are cheap to teach and reinforce; factual knowledge is expensive to preserve and easy to smear when you move the weights around. That's why a smart non-expert's intuition — 'tuning should make everything better' — quietly fails: you're not adding knowledge, you're re-sculpting a surface, and re-sculpting always erases something underneath.
Sources 6 notes
Models trained on semantically empty or deliberately incorrect instructions achieve comparable performance to those trained on full correct instructions, achieving 43% vs random baseline 42.6%. The semantic content of instructions appears largely irrelevant; what transfers is knowledge of the output space.
CoT works by constraining models to reproduce familiar reasoning patterns from training, not by enabling novel symbolic reasoning. Performance degrades predictably under distribution shifts—the signature of imitation rather than capability emergence.
LRMs don't break at complexity thresholds but at instance-novelty boundaries. Models fit instance-based patterns rather than generalizable algorithms, so any reasoning chain succeeds if trained on similar instances, regardless of length.
SoftCoT avoids catastrophic forgetting by keeping the main LLM frozen while delegating soft thought generation to a small auxiliary model. This architectural separation maintains pre-trained knowledge while enabling continuous reasoning.
Three faithfulness tests show fine-tuned models generate reasoning chains that less reliably influence final outputs. Early termination, paraphrasing, and filler substitution all produce invariant answers more often after fine-tuning, suggesting reasoning becomes performative rather than functional.
The MathIF benchmark shows that SFT and RL training improve reasoning but reduce instruction adherence, particularly as chain-of-thought length increases. Longer reasoning chains create contextual distance that dilutes the model's attention to original instructions.