SYNTHESIS NOTE
Model Architecture and Internals Reasoning, Retrieval, and Evaluation Training, RL, and Test-Time Scaling

When do language models stop memorizing and start generalizing?

Can we measure the exact capacity limit where models transition from memorizing training data to learning underlying patterns? Understanding this boundary could reshape how we think about model learning and privacy.

Synthesis note · 2026-02-23 · sourced from Memory
What kind of thing is an LLM really? How should researchers navigate LLM reasoning research?

The standard approach to measuring memorization — attempting to extract training data from the model — is fundamentally flawed. Language models can be coerced to output almost any string, so generation is not proof of memorization. Conversely, a model may memorize patterns (every other token, structural regularities) without reproducing text verbatim. Extraction is neither necessary nor sufficient.

The formal separation: unintended memorization is the information a model contains about a specific dataset (the bits that would change if a particular example were removed from training). Generalization is the information the model contains about the true data-generation process. By isolating and eliminating the generalization component, total memorization becomes measurable.

The key empirical finding: GPT-family models have an approximate capacity of 3.6 bits-per-parameter for unintended memorization. Models memorize training data until this capacity fills. At that point, a phase transition occurs — grokking begins, and unintended memorization decreases as models begin to generalize.

This reframes the grokking phenomenon mechanistically. Since What happens inside models when they suddenly generalize?, the capacity-filling measurement adds the trigger condition: grokking doesn't begin at an arbitrary training step — it begins when memorization saturates. The three phases are downstream of a capacity constraint, not of training duration per se.

The practical implication: memorization capacity is a measurable property of a specific model, not a property of the training algorithm. Two models trained by the same algorithm on the same data can have different memorization properties. This matters for privacy (which models leak more), for understanding generalization (capacity constrains when it begins), and for the Can AI pass every test while understanding nothing? question — a model that appears to generalize may simply have unfilled memorization capacity.

Inquiring lines that use this note as a source 25

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
13 direct connections · 97 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

llm memorization formally separates into unintended memorization and generalization — 3.6 bits-per-parameter capacity fills before grokking begins