SYNTHESIS NOTE
Model Architecture and Internals Reasoning, Retrieval, and Evaluation

Do networks recover from forgetting before re-encountering documents?

When language models train cyclically on repeated documents, do they anticipate upcoming material and recover from forgetting in advance? This challenges the standard catastrophic-interference narrative about sequential training.

Synthesis note · 2026-06-03 · sourced from Knowledge Graphs

The default story of sequential training is catastrophic interference: forgetting increases monotonically as a network trains on a sequence of different documents. This paper studies a structured non-IID setting — documents presented cyclically in a fixed, repeated order — and finds a remarkable opposite phenomenon: anticipatory recovery. Networks recover from the forgetting of a document before they encounter it again in the cycle, as if pre-positioning themselves for what's coming. The effect emerges and becomes more robust as the model scales up parameters, and only when each document is well-fitted before moving on; visualizations of weights, activations, and gradients show clear temporal structure.

The keeper is that over-parameterized networks in structured, repeating environments behave unlike the catastrophic-interference picture — they exploit the temporal regularity of the training schedule to organize their weights anticipatorily. This is closer to how humans learn from structured, repeating material than the random-sampling default of LLM pretraining.

This adds a training-dynamics surprise to the vault. It connects to the broader theme that structure in the learning process matters, alongside Does teaching question patterns before document training improve knowledge access? (order of encoding shapes outcomes) and Is LLM forgetting really knowledge loss or alignment loss? (forgetting is often recoverable, not destruction) — both complicate the simple catastrophic-forgetting narrative.

Inquiring lines that use this note as a source 7

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 2

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
13 direct connections · 130 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

networks trained on cyclically repeated documents anticipate and recover from forgetting before re-encountering them and this emerges with scale