What determines whether accumulated state generalizes spuriously across continual learning domains?
This explores when knowledge a model carries forward from one continual-learning task bleeds into another as false generalization — and what structural factors decide whether that carryover helps or corrupts.
This explores when accumulated state — weights, memories, or skills carried across tasks — generalizes the *wrong* way as a model learns domain after domain. The corpus points to one dividing line above all others: where the state lives. When learning is written into shared weights, drift is the enemy. Models that wander far from their base distribution lose plasticity and stall the moment the domain shifts, while those that stay close keep adapting — staying up to 70% closer to base preserves the ability to learn the next task at all Does staying close to the base model preserve learning ability?. Spurious generalization, in other words, often rides in on the same parameter updates that overwrote the old domain in the first place.
The sharpest contrast in the corpus is between weight-based accumulation and externalized state. When skills are stored outside the weights — an indexed library composed from simpler pieces — agents learn continuously without the catastrophic forgetting that weight updates cause Can agents learn new skills without forgetting old ones?. The same move appears in agents that adapt entirely through episodic memory rather than parameter changes, formalizing learning as memory operations and hitting strong benchmarks without touching the model Can agents learn continuously from experience without updating weights?, and in agents that store verbal self-diagnoses after each failure Can agents learn from failure without updating their weights?. Externalized state generalizes more honestly because each piece is named, inspectable, and composed deliberately — not blended into an opaque distribution where a useful skill from domain A silently bends predictions in domain B.
What actually *triggers* the spurious carryover is distributional shift meeting memorized shortcuts. A revealing finding: in chain-of-thought reasoning, *local* memorization — predicting from the immediately preceding tokens — causes up to 67% of errors, and it gets worse precisely as complexity rises and the task drifts from the training distribution Where do memorization errors arise in chain-of-thought reasoning?. That is spurious generalization in miniature: state that was a reliable pattern in-domain becomes a misfire out-of-domain. The boundary between memorizing and genuinely generalizing turns out to be a measurable capacity threshold — models memorize up to roughly 3.6 bits per parameter, and only once that fills does the phase transition into real generalization (grokking) begin When do language models stop memorizing and start generalizing?. Below that line, accumulated state is just stored examples waiting to be misapplied.
There's a quieter, more surprising thread: models seem to have built-in defenses against this. Under out-of-distribution tasks, hidden states sparsify in a systematic, localized way that acts as a selective filter — stabilizing performance rather than collapsing it Do language models sparsify their activations under difficult tasks?. And what you train *on* shapes how robustly state transfers: training on complete, messy exploration paths — failures, backtracking, recovery — produces more robust reasoning than training on clean shortcut solutions, because the model internalizes search rather than memorizing answers that won't survive a domain change Can models learn better by training on messy exploration paths?.
So the determinants stack up: how far the weights drift from base, whether state is externalized and composable or baked into shared parameters, whether the model has crossed from memorization into genuine generalization, and whether it was trained on brittle shortcuts or robust search. The thing you didn't know you wanted to know — RL agents will quietly recruit the *environment itself* as external memory, with mathematical proof that environmental artifacts reduce the information an agent must store internally Do RL agents accidentally use environments as memory?. Sometimes the safest place for accumulated state to live isn't the weights or even a memory module — it's the world the agent is standing in.
Sources 9 notes
FST-trained models stay up to 70% closer to their base distribution than parameter-only RL, and this reduced drift preserves the model's ability to learn subsequent tasks effectively. Parameter-only approaches stall when task domains change, while low KL drift enables sustained adaptation.
VOYAGER demonstrates that storing executable skills in an embedding-indexed library and composing complex skills from simpler ones allows agents to learn continuously while avoiding the forgetting that occurs with weight-update-based methods. Environmental feedback refines skills while an automatic curriculum drives continual exploration.
AgentFly formalizes agent learning as a Memory-augmented MDP with three memory modules (case, subtask, tool) that enable credit assignment and policy improvement entirely through memory operations. The approach achieved 87.88% on GAIA validation without modifying LLM parameters.
Reflexion demonstrates that unambiguous environmental feedback (success/failure) enables agents to write useful self-diagnoses and improve across episodes without parameter updates. The binary signal prevents rationalization, and keeping reflections uncompressed preserves their usability.
STIM framework identifies local, mid-range, and long-range memorization sources in CoT reasoning. Local memorization—based on preceding tokens—accounts for up to 67% of reasoning errors, especially as complexity increases and distributional shift occurs.
GPT-family models have a measurable memorization capacity of approximately 3.6 bits-per-parameter. When this capacity fills, a phase transition triggers grokking—the shift from memorization to genuine generalization. This capacity is a property of individual models, not training algorithms.
As task difficulty increases, LLM hidden states become substantially sparser in a localized, systematic way that correlates with task unfamiliarity and reasoning load. This sparsification acts as a selective filter stabilizing performance under OOD shift rather than a failure mode.
Research shows that training on messy trajectories—failed attempts, self-correction, and backtracking—teaches more robust reasoning than training only on shortcut solutions. This approach models o1-style deep reasoning as search internalization rather than solution memorization.
Mathematical proof shows that environmental artifacts reduce information needed to represent history in RL agents. Path-following agents naturally develop memory-like behavior through standard reward optimization, satisfying situated cognition criteria without explicit memory objectives.