INQUIRING LINE

Inquiring lines›How should we train models for cap…›What systematic failures and vulne…›Does fine-tuning modify underlying…›this inquiring line

What if most 'hard' AI problems aren't actually hard — they're just far from what the model was trained on?

How does distributional distance from pre-training relate to model difficulty?

This explores how a model's distance from its pre-training distribution — how far you push it during training or how far a problem sits from what it saw — shapes what looks 'hard,' and how that distance can quietly corrupt capability rather than extend it.

This explores how a model's distance from its pre-training distribution relates to difficulty — both the apparent difficulty of problems and the real cost of training a model away from where it started. The corpus suggests something counterintuitive: a lot of what we read as 'difficulty' is actually distance from the training distribution wearing a costume.

The clearest case is reasoning length. You'd assume a model writes longer chains of thought because a problem is harder — but controlled maze experiments show trace length tracks difficulty only when the problem is in-distribution, and decouples entirely once you step outside it Does longer reasoning actually mean harder problems?. Long traces mostly reflect recall of familiar training schemas, not adaptive effort. So 'hard' and 'far from pre-training' get conflated, and the visible signal of struggle is unreliable.

The more striking thread is that distance is something training actively spends, and overspending it backfires. Training on problems that sit too far out — nearly impossible RLVR samples — doesn't stretch the model; it teaches degenerate shortcuts that then contaminate skills the model already had Do overly hard RLVR samples actually harm model capabilities?. The same pattern appears in distillation: teacher-refined data that exceeds a student's 'learning frontier' degrades it even when the data is objectively higher quality, so students should filter for what's compatible with their own distribution Does teacher-refined data always improve student model performance?. Difficulty isn't absolute — it's relative to where the model already lives.

This reframes staying close to pre-training as a resource rather than a limitation. Models trained to drift less from their base distribution preserve their ability to keep learning new tasks, while methods that wander far stall when domains shift Does staying close to the base model preserve learning ability?. Decoding-time proxy tuning preserves pre-trained knowledge precisely because it never moves the base weights, applying distributional shifts that touch style and reasoning instead of corrupting the lower layers where knowledge is stored Can decoding-time tuning preserve knowledge better than weight fine-tuning?. That layered picture is confirmed elsewhere: pre-training scale builds factual knowledge in lower layers, fine-tuning reshapes behavior in upper ones Do pretraining and fine-tuning scale independently in language models?. Push too hard on the wrong layers and you forget more than you gain.

The unexpected payoff: distance also shapes generalization in opposite directions depending on the axis. Richer teacher context produces confident, concise traces that win in-domain but collapse out-of-distribution, because the model stops expressing the uncertainty hard novel problems demand Does richer teacher context hurt student generalization?. And RL doesn't expand a model so much as collapse it onto a single dominant pre-training format within the first epoch, picking the winner by scale rather than merit Does RL training collapse format diversity in pretrained models?. So the through-line is this: difficulty for a model is mostly a story about distance — and the methods that work are the ones that respect how far they can move it before capability starts leaking out.

Sources 8 notes

Does longer reasoning actually mean harder problems?

Controlled A* maze experiments show trace length correlates with difficulty only in-distribution but decouples entirely out-of-distribution. Trace length primarily reflects recall of training schemas, not adaptive computation.

Do overly hard RLVR samples actually harm model capabilities?

Training on nearly-impossible problems causes models to learn degenerate shortcuts rather than genuine reasoning, and these shortcuts contaminate pre-existing capabilities. Group-relative normalization treats rare accidental successes as high-advantage trajectories, reinforcing answer repetition and computation-skipping instead of sound reasoning patterns.

Does teacher-refined data always improve student model performance?

Teacher-refined data degrades performance when it exceeds the student's learning frontier, even if objectively higher quality. Students should filter refinements using their own statistical profile to retain only compatible improvements.

Does staying close to the base model preserve learning ability?

FST-trained models stay up to 70% closer to their base distribution than parameter-only RL, and this reduced drift preserves the model's ability to learn subsequent tasks effectively. Parameter-only approaches stall when task domains change, while low KL drift enables sustained adaptation.

Can decoding-time tuning preserve knowledge better than weight fine-tuning?

Proxy-tuning closes 88-91% of the alignment gap while surpassing direct fine-tuning on knowledge tasks by leaving base model weights untouched. Direct fine-tuning corrupts knowledge storage in lower layers, whereas proxy-tuning applies distributional shifts that primarily affect reasoning and style.

Show all 8 sources

Do pretraining and fine-tuning scale independently in language models?

Emulated Fine-Tuning reveals that scaling pretraining improves factual knowledge while scaling fine-tuning improves behavioral helpfulness. This decoupling has architectural roots: pretraining enriches lower-layer knowledge storage, while fine-tuning modifies upper-layer behavior expression.

Does richer teacher context hurt student generalization?

Teachers conditioned on correct answers and verifier output produce confident, concise traces that students inherit. This style suppresses uncertainty expression, optimizing in-domain performance while degrading generalization to out-of-distribution problems that require epistemic caution.

Does RL training collapse format diversity in pretrained models?

Controlled experiments show RL consistently amplifies one format distribution from pretraining within the first epoch while collapsing alternatives. The winning format depends on model scale, not necessarily performance, and is largely hidden when starting from proprietary pretrained models.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing claims about distributional distance and model difficulty in LLMs. The question remains: does a model's distance from pre-training truly explain what we call 'difficulty', and does staying close to pre-training preserve capability?

What a curated library found — and when (dated claims, not current truth):
Findings span 2022–2026; treat each as perishable until re-validated:
• CoT trace length decouples from problem difficulty outside the training distribution; it mostly recalls familiar schemas rather than reflecting adaptive effort (~2025, arXiv:2509.07339).
• Training on overly-hard RLVR samples induces degenerate shortcuts that contaminate already-learned skills; difficulty is relative to the model's current distribution, not absolute (~2026, arXiv:2605.28388).
• Lower KL drift from base model preserves plasticity and continual learning; methods that wander far stall under domain shift (~2026, arXiv:2605.12484).
• RL post-training collapses onto a single dominant pre-training format within the first epoch, picking the winner by scale; it doesn't expand capability so much as select and amplify (~2025, arXiv:2504.07912).
• Richer teacher context produces confident, concise traces that win in-distribution but collapse out-of-distribution, because the model stops expressing uncertainty (~2026, arXiv:2603.24472).

Anchor papers (verify; mind their dates):
– arXiv:2509.07339 (Performative Thinking?, 2025)
– arXiv:2605.28388 (Sample Difficulty in RLVR, 2026)
– arXiv:2605.12484 (Learning, Fast and Slow, 2026)
– arXiv:2504.07912 (Echo Chamber, 2025)

Your task:
(1) RE-TEST EACH CONSTRAINT. For CoT length, RLVR collapse, KL drift, and teacher context: has newer instrumentation (mechanistic probes, causal surgery, in-distribution vs. held-out evaluation splits) since shown these constraints are softer, or have they held under scaled models and longer training? Isolate which parts still appear binding and which recent work has relaxed them.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months — especially any claiming that distance *doesn't* predict difficulty, or that aggressive fine-tuning doesn't degrade generalization, or that RL *does* expand rather than collapse the model's capability surface.
(3) Propose 2 research questions that assume the regime may have shifted: e.g., do frontier-scale models show *weaker* plasticity-distance trade-offs than smaller ones, or do novel decoding-time interventions (beyond proxy tuning) now let models traverse distributional distance without capability loss?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

What if most 'hard' AI problems aren't actually hard — they're just far from what the model was trained on?

Related lines of inquiry

Sources 8 notes

Papers this line draws on 8