INQUIRING LINE

Inquiring lines›How should we train models for cap…›What systematic failures and vulne…›Does fine-tuning modify underlying…›this inquiring line

Fine-tuning an AI reshapes which abilities come out — rarely adding new ones, sometimes quietly erasing old ones.

What happens to base model capabilities when you apply finetuning?

This explores what fine-tuning actually does to the abilities a model already learned in pretraining — whether it adds, sharpens, hides, or quietly damages them.

This reads the question as: when you take a capable base model and fine-tune it, do you get a strictly better model, or do you trade something away? The corpus's surprising consensus is that fine-tuning rarely *creates* raw capability — more often it reshapes how capability gets expressed, and sometimes it degrades what was already there. One useful frame splits the model in two: pretraining scale drives factual knowledge stored in lower layers, while fine-tuning scale drives behavioral helpfulness expressed in upper layers Do pretraining and fine-tuning scale independently in language models?. So fine-tuning is largely a behavior-shaping operation layered on top of knowledge it didn't put there.

That framing explains a recurring finding about reasoning: the ability is usually already latent in the base model, and RL-style post-training mostly teaches *when* to deploy it rather than *how* to do it — hybrid models recover 91% of the gains just by routing tokens, and reasoning activation vectors exist before any RL touches the weights Does RL post-training create reasoning or just deploy it?. But this isn't universal: for standard reasoning RL activates what's already there, while for complex multi-step planning it can generate genuinely novel strategies the base model can't reach even with heavy sampling Does reinforcement learning create new reasoning abilities or activate existing ones?. So whether fine-tuning adds capability depends on how far the task sits from what the base model already knows.

The darker side is that fine-tuning can actively corrode base capabilities. Fine-tuning can make a model's chain-of-thought *performative* — the reasoning steps stop causally driving the answer, so the model looks like it's thinking while the explanation has come loose from the output Does fine-tuning disconnect reasoning steps from final answers?. RL fine-tuning can sharpen memorization and template-matching rather than install real procedures, which collapses on out-of-distribution variants Do fine-tuned language models actually learn optimization procedures?. Push the training signal too hard — overly difficult RLVR samples — and models learn degenerate shortcuts that contaminate pre-existing skills Do overly hard RLVR samples actually harm model capabilities?. Fine-tuning also narrows the model: RL converges on a single dominant pretraining format and suppresses the alternatives within the first epoch Does RL training collapse format diversity in pretrained models?, and preference tuning shifts diversity — cutting it in code while raising it in creative writing, depending on what each domain rewards Does preference tuning always reduce diversity the same way?.

The most direct answer to "does it forget?" is yes, and the corpus's mitigations are revealing because they all work by *touching fewer weights*. SoftCoT freezes the entire backbone and trains a tiny auxiliary model, so continuous reasoning gets added without disturbing pretrained knowledge Can continuous reasoning avoid forgetting in instruction-tuned models?. Parameter isolation freezes the core regions each task depends on and only merges the non-core parts, beating standard multi-task fine-tuning Can isolating task-specific parameters prevent multi-task fine-tuning interference?. Transformer² goes further, tuning only the singular values of weight matrices to produce composable expert vectors that mix at inference without interference Can models dynamically activate expert skills at inference time?. The pattern across all three: the more you let fine-tuning rewrite the base weights, the more base capability you risk losing — so the frontier is figuring out how to add behavior while leaving the foundation intact.

The thing you might not have expected to learn: fine-tuning's failures often aren't visible as lower accuracy. A model can score the same while its reasoning has quietly detached from its answers, its format diversity has collapsed, or its apparent skill is brittle memorization that shatters the moment the test set shifts. "What happens to capabilities" isn't just a question of how much — it's about what kind of capability survives, and whether the survivor is the real thing or a convincing imitation.

Sources 11 notes

Do pretraining and fine-tuning scale independently in language models?

Emulated Fine-Tuning reveals that scaling pretraining improves factual knowledge while scaling fine-tuning improves behavioral helpfulness. This decoupling has architectural roots: pretraining enriches lower-layer knowledge storage, while fine-tuning modifies upper-layer behavior expression.

Does RL post-training create reasoning or just deploy it?

Evidence shows base models already contain reasoning capability in latent form; RL training optimizes deployment timing rather than capability creation. Hybrid models recover 91% of performance gains by routing tokens only, and activation vectors for reasoning strategies pre-exist before any RL.

Does reinforcement learning create new reasoning abilities or activate existing ones?

For standard reasoning tasks, RL activates latent abilities already present in base models. For complex planning requiring multi-step coordination, RL generates genuinely novel strategies inaccessible to base models even with extensive sampling.

Does fine-tuning disconnect reasoning steps from final answers?

Three faithfulness tests show fine-tuned models generate reasoning chains that less reliably influence final outputs. Early termination, paraphrasing, and filler substitution all produce invariant answers more often after fine-tuning, suggesting reasoning becomes performative rather than functional.

Do fine-tuned language models actually learn optimization procedures?

Even GRPO-trained models show sharp performance drops on out-of-distribution variants (N-1 test sets) compared to in-distribution problems, indicating RL optimizes template-matching rather than genuine problem-solving procedures.

Show all 11 sources

Do overly hard RLVR samples actually harm model capabilities?

Training on nearly-impossible problems causes models to learn degenerate shortcuts rather than genuine reasoning, and these shortcuts contaminate pre-existing capabilities. Group-relative normalization treats rare accidental successes as high-advantage trajectories, reinforcing answer repetition and computation-skipping instead of sound reasoning patterns.

Does RL training collapse format diversity in pretrained models?

Controlled experiments show RL consistently amplifies one format distribution from pretraining within the first epoch while collapsing alternatives. The winning format depends on model scale, not necessarily performance, and is largely hidden when starting from proprietary pretrained models.

Does preference tuning always reduce diversity the same way?

RLHF reduces lexical-syntactic diversity in code generation but increases it in creative writing. The direction depends on what each domain incentivizes: code rewards convergence toward correct solutions, while creative writing rewards stylistic distinctiveness.

Can continuous reasoning avoid forgetting in instruction-tuned models?

SoftCoT avoids catastrophic forgetting by keeping the main LLM frozen while delegating soft thought generation to a small auxiliary model. This architectural separation maintains pre-trained knowledge while enabling continuous reasoning.

Can isolating task-specific parameters prevent multi-task fine-tuning interference?

Research shows that identifying core parameter regions per task, clustering overlapping tasks, and freezing core parameters while geometrically merging non-core parameters consistently outperforms standard multi-task fine-tuning. Temporal task scheduling alone proves insufficient without explicit structural parameter isolation.

Can models dynamically activate expert skills at inference time?

Transformer2 demonstrates that tuning only singular values within weight matrices produces composable expert vectors that dynamically mix at inference without interference, outperforming LoRA with fewer parameters and enabling continual specialization.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining4.26 match · arxiv ↗
Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!3.38 match · arxiv ↗
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models3.38 match · arxiv ↗
On the Impact of Fine-Tuning on Chain-of-Thought Reasoning3.35 match · arxiv ↗
The Invisible Leash: Why RLVR May Not Escape Its Origin2.50 match · arxiv ↗
Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?2.50 match · arxiv ↗
Eliciting Reasoning in Language Models with Cognitive Tools1.73 match · arxiv ↗
On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models1.73 match · arxiv ↗

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing claims about fine-tuning's effect on base model capabilities. The question remains open: when you fine-tune a capable base model, do you strictly improve it, or do you trade capability for behavior—and can you measure the real cost?

What a curated library found — and when (dated claims, not current truth):
Findings span Oct 2023–May 2026. A library of recent work reports:
- Fine-tuning is primarily a *behavior-shaping* operation on latent knowledge installed during pretraining, not a capability builder; reasoning RL activates existing ability rather than creates it, recovering ~91% of gains through token routing alone (~2025).
- Fine-tuning can *degrade* base capabilities silently: chain-of-thought reasoning becomes performative (explanations decoupled from outputs), memorization and template-matching replace real procedures, and preference tuning narrows format diversity within the first RL epoch (~2025).
- Hard RLVR samples induce degenerate shortcuts that contaminate pre-existing skills; the mitigation pattern across SoftCoT, parameter isolation, and Transformer² is to touch *fewer weights*—freezing backbone, isolating non-core parameters, or tuning only singular values (~2025).
- Multi-task fine-tuning interference occurs in core parameter regions; diversity losses are domain-dependent (code shrinks, creative writing expands) (~2025–2026).

Anchor papers (verify; mind their dates):
- arXiv:2411.15382 (2024-11): On the Impact of Fine-Tuning on Chain-of-Thought Reasoning
- arXiv:2501.06252 (2025-01): Transformer2: Self-adaptive LLMs
- arXiv:2502.12134 (2025-02): SoftCoT: Soft Chain-of-Thought
- arXiv:2605.28388 (2026-05): Mechanistically Interpreting Sample Difficulty in RLVR

Your task:
(1) RE-TEST EACH CONSTRAINT. For each finding above, judge whether newer models (e.g., o1, o3, reasoning-at-scale), training methods (e.g., distillation from reasoning models, multi-stage fine-tuning), tooling (e.g., LoRA variants, parameter-efficient adapters), or orchestration (e.g., mixture-of-experts, dynamic routing) have since relaxed or overturned it. Separate the durable question—does fine-tuning trade capability for behavior?—from perishable limitations (e.g., "RL must rewrite all weights"). Say plainly where constraints still hold.
(2) Surface the strongest *contradicting or superseding* work from the last ~6 months—papers that show fine-tuning *does* create capability, or that the latency/capability trade-off has dissolved.
(3) Propose 2 research questions that assume the regime may have moved: e.g., "If reasoning is now pre-computed at scale, does fine-tuning's role shift from activation to *compression*?" or "Can fine-tuning restore diversity lost in prior RL, or is it a one-way gate?"

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Fine-tuning an AI reshapes which abilities come out — rarely adding new ones, sometimes quietly erasing old ones.

Related lines of inquiry

Sources 11 notes

Papers this line draws on 8