INQUIRING LINE

Inquiring lines›How should we train models for cap…›What systematic failures and vulne…›Does fine-tuning modify underlying…›this inquiring line

Pretraining fills an AI's knowledge; finetuning shapes how it behaves — and they improve completely different parts of the model.

How do finetuning and pretraining improvements differ in their effects on model capabilities?

This explores how improvements at the pretraining stage versus the finetuning stage change what a model can actually do — and the corpus draws a surprisingly clean division of labor between the two.

This explores how improvements at the pretraining stage versus the finetuning stage change what a model can actually do, and the most useful frame the corpus offers is a division of labor: pretraining builds the *knowledge*, finetuning shapes the *behavior*. The cleanest statement of this comes from work showing that scaling pretraining improves factual accuracy while scaling finetuning improves helpfulness — and that the split has architectural roots, with pretraining enriching lower-layer knowledge storage and finetuning modifying how upper layers express it Do pretraining and fine-tuning scale independently in language models?. So the two stages aren't doing more or less of the same thing; they're operating on different parts of the model and producing different kinds of gains.

The striking implication is that finetuning often *surfaces* capability rather than *creating* it. Several notes converge here from different angles. Instruction tuning, for instance, seems to teach a model the shape of the expected output rather than any new understanding of the task — models trained on semantically empty or even deliberately wrong instructions perform about as well as those trained on correct ones Does instruction tuning teach task understanding or output format?. RL post-training tells a parallel story: base models already contain reasoning ability in latent form, and RL mostly optimizes *when* to deploy it rather than installing anything new — hybrid models recover most of the gains by routing tokens alone Does RL post-training create reasoning or just deploy it?. Even mechanically, RL updates only a sparse 5–30% of parameters and works largely by suppressing wrong trajectories rather than building new ones What actually changes inside a model during RL training?.

This is why finetuning improvements scale so differently from pretraining ones. Finetuning follows a multiplicative scaling law where a *larger base model* helps far more than more finetuning data — you're amplifying what pretraining already laid down, not adding fresh knowledge How should finetuning scale with model and data size?. And because finetuning is editing behavior on top of fixed knowledge, you don't even need to touch the weights to get the effect: intervening on frozen hidden representations beats LoRA by 10–50x on parameter efficiency Can editing hidden representations beat weight updates for finetuning?.

The corpus is also blunt about finetuning's failure modes, which differ in character from pretraining's. Because finetuning reshapes expression rather than understanding, it can make reasoning *performative* — fine-tuned models produce chains of thought that less reliably drive the final answer Does fine-tuning disconnect reasoning steps from final answers?. Push RL too hard on impossible problems and it amplifies degenerate shortcuts that contaminate pre-existing capability Do overly hard RLVR samples actually harm model capabilities?. RL fine-tuning can sharpen template-matching that collapses on out-of-distribution variants, revealing memorization rather than learned procedure Do fine-tuned language models actually learn optimization procedures?. And RL tends to collapse the rich format diversity pretraining provided down to a single dominant style Does RL training collapse format diversity in pretrained models?.

The takeaway a curious reader might not expect: pretraining decides the ceiling of what a model knows and can do, and finetuning is mostly a steering and selection layer on top — powerful for shaping helpfulness, format, and reasoning deployment, but prone to degrading the very capabilities it sits on if pushed past what the base supports. That's also why the data you finetune with has to match the model's existing frontier; refinements above a student's reach hurt rather than help Does teacher-refined data always improve student model performance?, and even training order can preserve or destroy open-ended ability depending on how entropy is managed Does training order reshape how models handle different task types?.

Sources 12 notes

Do pretraining and fine-tuning scale independently in language models?

Emulated Fine-Tuning reveals that scaling pretraining improves factual knowledge while scaling fine-tuning improves behavioral helpfulness. This decoupling has architectural roots: pretraining enriches lower-layer knowledge storage, while fine-tuning modifies upper-layer behavior expression.

Does instruction tuning teach task understanding or output format?

Models trained on semantically empty or deliberately incorrect instructions achieve comparable performance to those trained on full correct instructions, achieving 43% vs random baseline 42.6%. The semantic content of instructions appears largely irrelevant; what transfers is knowledge of the output space.

Does RL post-training create reasoning or just deploy it?

Evidence shows base models already contain reasoning capability in latent form; RL training optimizes deployment timing rather than capability creation. Hybrid models recover 91% of performance gains by routing tokens only, and activation vectors for reasoning strategies pre-exist before any RL.

What actually changes inside a model during RL training?

RL's effects concentrate in structurally sparse but full-rank subnetworks across multiple algorithms and models. Suppressing wrong trajectories—rather than amplifying correct ones—appears to be the primary mechanism, with training following a predictable two-phase pattern of procedural consolidation then strategic exploration.

How should finetuning scale with model and data size?

Systematic experiments across 1B–16B models reveal finetuning follows a power-based multiplicative scaling law. Larger base models improve finetuning more than more pretraining data, while increasing PET parameters provides minimal benefit.

Show all 12 sources

Can editing hidden representations beat weight updates for finetuning?

ReFT learns task-specific interventions on frozen model representations rather than updating weights, with LoReFT (low-rank linear subspace variant) dramatically outperforming LoRA across reasoning, instruction-following, and NLU benchmarks while using far fewer parameters.

Does fine-tuning disconnect reasoning steps from final answers?

Three faithfulness tests show fine-tuned models generate reasoning chains that less reliably influence final outputs. Early termination, paraphrasing, and filler substitution all produce invariant answers more often after fine-tuning, suggesting reasoning becomes performative rather than functional.

Do overly hard RLVR samples actually harm model capabilities?

Training on nearly-impossible problems causes models to learn degenerate shortcuts rather than genuine reasoning, and these shortcuts contaminate pre-existing capabilities. Group-relative normalization treats rare accidental successes as high-advantage trajectories, reinforcing answer repetition and computation-skipping instead of sound reasoning patterns.

Do fine-tuned language models actually learn optimization procedures?

Even GRPO-trained models show sharp performance drops on out-of-distribution variants (N-1 test sets) compared to in-distribution problems, indicating RL optimizes template-matching rather than genuine problem-solving procedures.

Does RL training collapse format diversity in pretrained models?

Controlled experiments show RL consistently amplifies one format distribution from pretraining within the first epoch while collapsing alternatives. The winning format depends on model scale, not necessarily performance, and is largely hidden when starting from proprietary pretrained models.

Does teacher-refined data always improve student model performance?

Teacher-refined data degrades performance when it exceeds the student's learning frontier, even if objectively higher quality. Students should filter refinements using their own statistical profile to retain only compatible improvements.

Does training order reshape how models handle different task types?

Omni-Thinker shows structured domains decrease output entropy while creative domains increase it. BWT-guided scheduling—training structured tasks first—yields 6.2% gains over joint training by preventing entropy collapse from damaging open-ended capabilities.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst examining whether pretraining and finetuning improvements remain fundamentally distinct in their effects on LLM capabilities—or whether recent methods have begun to blur that boundary.

What a curated library found — and when (dated claims, not current truth): Findings span 2023–2026.
• Pretraining builds *knowledge* (lower-layer enrichment, factual accuracy); finetuning shapes *behavior* (upper-layer expression, helpfulness) — operating on different model regions with different scaling laws (~2024).
• Instruction tuning and RL post-training mostly *surface* latent capability rather than create new understanding; RL updates only 5–30% of parameters by suppressing wrong paths (~2025).
• Finetuning on frozen representations outperforms LoRA by 10–50x parameter efficiency, suggesting behavioral steering doesn't require weight updates (~2024).
• RL post-training collapses pretraining's format diversity into dominant templates and can degrade chain-of-thought faithfulness and robustness to distribution shift (~2024–2025).
• Teacher-refined finetuning data hurts when mismatched to student capability; training order and entropy management preserve or destroy open-ended ability (~2025).

Anchor papers (verify; mind their dates): arXiv:2305.11383 (2023), arXiv:2404.03592 (2024), arXiv:2411.15382 (2024), arXiv:2505.11711 (2025).

Your task:
(1) RE-TEST the pretraining/finetuning divide. Has the emergence of mechanistic interpretability (e.g., subnetwork isolation in arXiv:2505.11711), representation-level intervention, or hybrid training regimes (arXiv:2507.14783) begun to *unify* how the two stages work, or does the knowledge–behavior split still hold under recent scaling? Where does the split remain durable and where do newer findings challenge it?
(2) Surface the strongest work from the last ~6 months that *contradicts* the "finetuning surfaces, doesn't create" thesis—or deepens it in unexpected ways (e.g., arXiv:2507.14805 on subliminal behavioral transmission, arXiv:2508.08940 on curriculum effects).
(3) Propose two questions that assume the regime *has* shifted: (a) If finetuning now modifies knowledge via representation steering, does the multiplicative scaling law break?, (b) Can unified pretraining-finetuning objectives (multi-task RL, hybrid reward) recover the diversity RL collapses, and does that change what "capability" means?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Pretraining fills an AI's knowledge; finetuning shapes how it behaves — and they improve completely different parts of the model.

Related lines of inquiry

Sources 12 notes

Papers this line draws on 8