INQUIRING LINE

Inquiring lines›How should agents manage and coord…›How can training approaches develo…›What capability tradeoffs emerge w…›this inquiring line

Training an AI to follow instructions mostly teaches it style — and that reshaping quietly overwrites the facts it learned.

Why does instruction tuning hurt knowledge-intensive tasks more than reasoning tasks?

This explores why fine-tuning a model on instructions tends to erode its stored factual knowledge while leaving (or even sharpening) its ability to follow reasoning patterns — and the corpus suggests the answer is about what instruction tuning actually changes inside the model.

This explores why instruction tuning hits knowledge-intensive tasks harder than reasoning tasks. The cleanest clue in the corpus is that instruction tuning mostly teaches a model *how to format an answer*, not new content. In one striking result, models trained on semantically empty or even deliberately wrong instructions performed about as well as models trained on correct ones — 43% versus a 42.6% baseline — because what actually transferred was knowledge of the output space, not understanding of the task Does instruction tuning teach task understanding or output format?. If tuning is largely reshaping the *shape* of responses, then it pulls the model's weights toward producing a certain style and structure, and that reshaping is exactly what can overwrite the delicate, distributed traces where factual recall lives.

Reasoning, by contrast, seems to be more about reproducing *form* — and form is what instruction tuning reinforces. Chain-of-thought, on close inspection, works by constraining models to replay familiar reasoning templates learned in training rather than performing fresh symbolic inference Does chain-of-thought reasoning reveal genuine inference or pattern matching?. A related finding shows reasoning success depends on whether the model has seen similar *instances* before, not on the abstract difficulty of the task Do language models fail at reasoning due to complexity or novelty?. So a reasoning task is essentially a procedure to imitate — and tuning toward output structure happens to strengthen exactly that imitative scaffolding. Knowledge tasks have no such scaffolding to lean on; they need a specific fact retrieved intact, and there is nothing about format-shaping that protects that fact.

The forgetting angle makes this concrete. Continuous-reasoning work explicitly frames the danger: when you fine-tune the backbone, you risk catastrophic forgetting of pre-trained knowledge, which is why one approach freezes the main model entirely and delegates new behavior to a small auxiliary module — preserving stored knowledge while still gaining reasoning ability Can continuous reasoning avoid forgetting in instruction-tuned models?. That architectural choice only makes sense if updating the weights is what damages knowledge. The same theme shows up in faithfulness studies: after fine-tuning, a model's reasoning chains less reliably drive its final answer — the reasoning becomes performative rather than functional Does fine-tuning disconnect reasoning steps from final answers?. In other words, fine-tuning trades substance for surface, and knowledge is the substance with the most to lose.

There's a useful twist hiding here. Scaling reasoning through SFT and RL doesn't just spare reasoning — it can actively *cost* something else: longer chains of thought dilute the model's attention to the original instruction, degrading instruction-following Why do better reasoning models ignore instructions?. So the picture isn't 'tuning is bad,' it's that different capabilities sit in different places. Format and procedure are cheap to teach and reinforce; factual knowledge is expensive to preserve and easy to smear when you move the weights around. That's why a smart non-expert's intuition — 'tuning should make everything better' — quietly fails: you're not adding knowledge, you're re-sculpting a surface, and re-sculpting always erases something underneath.

Sources 6 notes

Does instruction tuning teach task understanding or output format?

Models trained on semantically empty or deliberately incorrect instructions achieve comparable performance to those trained on full correct instructions, achieving 43% vs random baseline 42.6%. The semantic content of instructions appears largely irrelevant; what transfers is knowledge of the output space.

Does chain-of-thought reasoning reveal genuine inference or pattern matching?

CoT works by constraining models to reproduce familiar reasoning patterns from training, not by enabling novel symbolic reasoning. Performance degrades predictably under distribution shifts—the signature of imitation rather than capability emergence.

Do language models fail at reasoning due to complexity or novelty?

LRMs don't break at complexity thresholds but at instance-novelty boundaries. Models fit instance-based patterns rather than generalizable algorithms, so any reasoning chain succeeds if trained on similar instances, regardless of length.

Can continuous reasoning avoid forgetting in instruction-tuned models?

SoftCoT avoids catastrophic forgetting by keeping the main LLM frozen while delegating soft thought generation to a small auxiliary model. This architectural separation maintains pre-trained knowledge while enabling continuous reasoning.

Does fine-tuning disconnect reasoning steps from final answers?

Three faithfulness tests show fine-tuned models generate reasoning chains that less reliably influence final outputs. Early termination, paraphrasing, and filler substitution all produce invariant answers more often after fine-tuning, suggesting reasoning becomes performative rather than functional.

Show all 6 sources

Why do better reasoning models ignore instructions?

The MathIF benchmark shows that SFT and RL training improve reasoning but reduce instruction adherence, particularly as chain-of-thought length increases. Longer reasoning chains create contextual distance that dilutes the model's attention to original instructions.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens2.60 match · arxiv ↗
Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models1.75 match · arxiv ↗
Measuring Faithfulness in Chain-of-Thought Reasoning1.74 match · arxiv ↗
On the Impact of Fine-Tuning on Chain-of-Thought Reasoning1.72 match · arxiv ↗
Are Emergent Abilities in Large Language Models just In-Context Learning?1.71 match · arxiv ↗
When Thinking Fails: The Pitfalls of Reasoning for Instruction-Following in LLMs1.70 match · arxiv ↗
A Comment On "The Illusion of Thinking": Reframing the Reasoning Cliff as an Agentic Gap1.70 match · arxiv ↗
Instruction Induction: From Few Examples to Natural Language Task Descriptions1.69 match · arxiv ↗

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing whether instruction tuning's documented harm to knowledge-intensive tasks still holds, treating prior findings as dated claims to refresh.

WHAT A CURATED LIBRARY FOUND — AND WHEN (findings span 2023–2026; these are perishable claims, not current truth):
• Instruction tuning primarily teaches output-format distribution, not task semantics; models trained on wrong instructions performed ~43% vs. 42.6% baseline, suggesting knowledge of output space, not understanding, transfers (2023–24).
• Chain-of-thought succeeds via constrained imitation of familiar reasoning templates rather than fresh symbolic inference; reasoning depends on instance-level familiarity, not abstract task difficulty (2024–2025).
• Fine-tuning degrades CoT faithfulness independently of accuracy; reasoning becomes performative rather than functional, driven by weight updates that smear pre-trained factual knowledge (2024–2025).
• Freezing the backbone and delegating new behavior to auxiliary modules preserves knowledge while enabling reasoning; scaling reasoning through SFT/RL incurs instruction-following deficits (2025–2026).

ANCHOR PAPERS (verify; mind their dates):
• arXiv:2305.11383 (2023) — Do Models Really Learn to Follow Instructions?
• arXiv:2411.15382 (2024) — On the Impact of Fine-Tuning on Chain-of-Thought Reasoning
• arXiv:2502.12134 (2025) — SoftCoT: Soft Chain-of-Thought for Efficient Reasoning
• arXiv:2506.02878 (2025) — CoT is Not True Reasoning, It Is Just a Tight Constraint to Imitate

YOUR TASK:
(1) RE-TEST EACH CONSTRAINT. For the four findings above, judge whether newer model scaling, architectural innovations (e.g., adapter-based tuning, LoRA variants, mixture-of-experts), multi-turn memory, or improved faithfulness metrics have since RELAXED or OVERTURNED them. Separate the durable question—does tuning trade knowledge for surface?—from the perishable limitation—e.g., does SFT necessarily degrade CoT fidelity? Cite what resolved it; flag where each still holds.
(2) SURFACE THE STRONGEST CONTRADICTING OR SUPERSEDING WORK from the last ~6 months—any paper claiming tuning *preserves* or *enhances* knowledge retention, or showing CoT is *not* mere imitation, or demonstrating instruction-following *scales* with reasoning.
(3) PROPOSE 2 research questions that assume the regime may have moved: e.g., does architectural separation (e.g., frozen embeddings + learned reasoning heads) eliminate the knowledge-reasoning tradeoff? Can contrastive or retrieval-augmented tuning protect factual knowledge while boosting reasoning?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Training an AI to follow instructions mostly teaches it style — and that reshaping quietly overwrites the facts it learned.

Related lines of inquiry

Sources 6 notes

Papers this line draws on 8