INQUIRING LINE

Inquiring lines›How do language models construct a…›How does AI persuasion undermine h…›Why do continual learning scenario…›this inquiring line

When you train an AI to reason in fluid concepts instead of words, it often forgets everything it already knew — what's happening?

Why does fine-tuning for continuous space cause catastrophic forgetting?

This explores why fine-tuning a model to reason in a continuous latent space (instead of discrete tokens) tends to erase pretrained knowledge — and what the corpus says about the deeper mechanism and the ways around it.

This explores why fine-tuning a model to reason in continuous space (latent 'soft thoughts' rather than discrete words) tends to wipe out what the model already knew. The corpus suggests the problem isn't really about continuous space at all — it's about *where* you put the new learning. When you update the model's own weights to support a new reasoning mode, you overwrite the very parameters that store pretrained knowledge. Several notes converge on the same diagnosis: forgetting is a misallocation problem, not an inherent cost of adaptation.

The sharpest evidence for that reframing comes from work showing that adaptation can be *split into two channels*. Fast-Slow Training routes task-specific lessons into optimized prompts (fast, textual context) while barely touching the weights (slow), and it reaches the same performance 1.4–3x faster with far less forgetting Can splitting adaptation into two channels reduce forgetting?. The mechanism behind why this works is measured directly elsewhere: models that drift less from their base distribution — staying up to 70% closer in KL terms — keep their *plasticity*, the ability to learn the next task at all. Weight-only approaches stall when the domain shifts because they've drifted too far Does staying close to the base model preserve learning ability?. So heavy weight updates don't just erase the past, they degrade the future.

The most direct answer to the continuous-reasoning version of the question is architectural: don't update the backbone at all. SoftCoT *freezes* the main LLM and delegates continuous-thought generation to a small auxiliary model, so the soft reasoning capability is added alongside pretrained knowledge rather than carved out of it Can continuous reasoning avoid forgetting in instruction-tuned models?. The same freeze-the-knowledge instinct shows up in proxy-tuning, which works at decoding time and actually *beats* direct fine-tuning on knowledge tasks — and the paper pinpoints why: direct fine-tuning corrupts knowledge storage in the lower layers, while leaving weights untouched confines change to reasoning and style Can decoding-time tuning preserve knowledge better than weight fine-tuning?. Lower layers are where facts live; continuous-space fine-tuning that reaches them is exactly what hurts.

Beyond freezing, the corpus offers two other escape routes. One is *parameter isolation* — identifying the core regions each task depends on and freezing those while merging the rest, which consistently beats naive multi-task fine-tuning Can isolating task-specific parameters prevent multi-task fine-tuning interference?. The other abandons weight updates entirely: store new skills or experiences *outside* the model. VOYAGER keeps an embedding-indexed library of executable skills and composes new ones from old, learning continuously with no forgetting Can agents learn new skills without forgetting old ones?, while AgentFly pushes policy improvement entirely into episodic memory modules, never touching LLM parameters Can agents learn continuously from experience without updating weights?.

The thread worth taking away: fine-tuning into continuous space causes forgetting because it relocates a *behavior* (how to reason) by overwriting a *store* (what is known) — they happen to share the same weights. There's even a subtler cost lurking here — fine-tuning can make reasoning chains *look* right while no longer actually driving the answer Does fine-tuning disconnect reasoning steps from final answers?. Every successful fix in the corpus works by separating the two: a second model, a prompt channel, an external library, or an untouched base — so the new capability and the old knowledge stop competing for the same parameters.

Sources 8 notes

Can splitting adaptation into two channels reduce forgetting?

Fast-Slow Training routes task-specific lessons into optimized prompts while keeping parameter updates minimal, reaching equivalent performance 1.4–3x faster with substantially less catastrophic forgetting and plasticity loss, demonstrating that forgetting is a misallocation problem rather than an inherent cost.

Does staying close to the base model preserve learning ability?

FST-trained models stay up to 70% closer to their base distribution than parameter-only RL, and this reduced drift preserves the model's ability to learn subsequent tasks effectively. Parameter-only approaches stall when task domains change, while low KL drift enables sustained adaptation.

Can continuous reasoning avoid forgetting in instruction-tuned models?

SoftCoT avoids catastrophic forgetting by keeping the main LLM frozen while delegating soft thought generation to a small auxiliary model. This architectural separation maintains pre-trained knowledge while enabling continuous reasoning.

Can decoding-time tuning preserve knowledge better than weight fine-tuning?

Proxy-tuning closes 88-91% of the alignment gap while surpassing direct fine-tuning on knowledge tasks by leaving base model weights untouched. Direct fine-tuning corrupts knowledge storage in lower layers, whereas proxy-tuning applies distributional shifts that primarily affect reasoning and style.

Can isolating task-specific parameters prevent multi-task fine-tuning interference?

Research shows that identifying core parameter regions per task, clustering overlapping tasks, and freezing core parameters while geometrically merging non-core parameters consistently outperforms standard multi-task fine-tuning. Temporal task scheduling alone proves insufficient without explicit structural parameter isolation.

Show all 8 sources

Can agents learn new skills without forgetting old ones?

VOYAGER demonstrates that storing executable skills in an embedding-indexed library and composing complex skills from simpler ones allows agents to learn continuously while avoiding the forgetting that occurs with weight-update-based methods. Environmental feedback refines skills while an automatic curriculum drives continual exploration.

Can agents learn continuously from experience without updating weights?

AgentFly formalizes agent learning as a Memory-augmented MDP with three memory modules (case, subtask, tool) that enable credit assignment and policy improvement entirely through memory operations. The approach achieved 87.88% on GAIA validation without modifying LLM parameters.

Does fine-tuning disconnect reasoning steps from final answers?

Three faithfulness tests show fine-tuned models generate reasoning chains that less reliably influence final outputs. Early termination, paraphrasing, and filler substitution all produce invariant answers more often after fine-tuning, suggesting reasoning becomes performative rather than functional.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

On the Impact of Fine-Tuning on Chain-of-Thought Reasoning2.50 match · arxiv ↗
SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning1.76 match · arxiv ↗
Continual Learning Bench: Evaluating Frontier AI Systems in Real-World Stateful Environments1.75 match · arxiv ↗
SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs1.72 match · arxiv ↗
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs1.72 match · arxiv ↗
Learning, Fast and Slow: Towards LLMs That Adapt Continually1.72 match · arxiv ↗
Towards Mechanistically Understanding Why Memorized Knowledge Fails to Generalize in Large Language Model Finetuning1.67 match · arxiv ↗
Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining1.66 match · arxiv ↗

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a researcher auditing claims about catastrophic forgetting in continuous-space fine-tuning. The question: *Why does fine-tuning for continuous reasoning (soft thought spaces) erase pretrained knowledge?* A curated library (2024–2026) proposed a counter-intuitive answer: the problem isn't continuous space itself—it's weight reuse. When you update the same parameters for both new reasoning and old knowledge storage, they collide. The library's core claims (spanning 2024–2026):

• Fine-tuning causes forgetting because task-specific learning overwrites knowledge-storing parameters in lower layers; architectures that split learning into fast (prompt) and slow (weight) channels reduce forgetting 1.4–3x while preserving plasticity (2026-05).
• Models drifting >30% in KL from base lose plasticity and fail at subsequent tasks; staying <30% drift preserves the ability to learn again (2026-05).
• Freezing the backbone and delegating continuous reasoning to auxiliary modules (SoftCoT) avoids the collision entirely (2025-02).
• Parameter isolation—identifying and freezing task-core regions—beats naive multi-task fine-tuning (2025-08).
• Storing skills externally (embedding libraries, episodic memory) enables continual learning with zero forgetting (2026-04, 2026-05).
• Fine-tuning can corrupt chain-of-thought faithfulness independent of accuracy gains (2024-11).

Anchor papers (verify; mind their dates): arXiv:2605.12484 (Learning, Fast and Slow; 2026-05), arXiv:2502.12134 (SoftCoT; 2025-02), arXiv:2508.21741 (Not All Parameters Are Created Equal; 2025-08), arXiv:2411.15382 (Fine-Tuning Impact on CoT; 2024-11).

Your task:
(1) RE-TEST each constraint. For every mechanism above, probe whether (a) newer scaling laws, instruction-tuning regimes, or mixture-of-expert architectures have *relaxed* the weight-reuse bottleneck, (b) improved KL-drift tracking or plasticity measures have *tightened* the <30% threshold, or (c) stronger auxiliary-model training or in-context prompting has *superseded* the freeze-the-backbone approach. Separate the durable insight (weight reuse is a real collision problem) from perishable claims (the specific thresholds, the superiority of particular fixes). Be direct: does the constraint still hold, or has it moved?
(2) Surface the strongest CONTRADICTING work from the last ~6 months—papers arguing forgetting is unavoidable, or that continuous-space reasoning *requires* weight updates, or that external libraries introduce their own failure modes.
(3) Propose two research questions that *assume the regime has moved*: (a) if weight freezing is now standard, what new forgetting modes emerge? (b) if KL drift is no longer the bottleneck, what is?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

When you train an AI to reason in fluid concepts instead of words, it often forgets everything it already knew — what's happening?

Related lines of inquiry

Sources 8 notes

Papers this line draws on 8