SYNTHESIS NOTE
Model Architecture and Internals Reasoning, Retrieval, and Evaluation

How do transformers learn to reason across multiple steps?

Does multi-hop reasoning in transformers emerge through distinct learning phases, and what geometric patterns in hidden representations explain when reasoning succeeds or fails?

Synthesis note · 2026-02-22 · sourced from Reasoning Logic Internal Rules
What makes chain-of-thought reasoning actually work? How do LLMs fail to know what they seem to understand? How should researchers navigate LLM reasoning research?

Training transformers from scratch in a controlled symbolic environment reveals that implicit multi-hop reasoning — answering compositional queries without verbalizing intermediate steps — emerges through three distinct developmental stages:

Phase I: Memorization. The model fits training data (atomic facts and 2-hop compositions) quickly. Generalization to unseen queries remains minimal.

Phase II: In-Distribution Generalization. After memorization saturates, the model begins generalizing to unseen ID-ID compositions — a shift from memorization to compositional reasoning within the training distribution. This resembles grokking: generalization emerges well after memorization converges.

Phase III: Cross-Distribution Reasoning. The model learns to compose OOD triples in the first hop with ID triples in the second. This transition is slower than Phase II. Crucially, generalization fails consistently when the SECOND hop is from OOD triples, revealing a stronger bottleneck in the second relational step.

Two mechanistic findings deepen the picture:

Cosine clustering as signature. Successful reasoning correlates with consistent clustering of intermediate entity representations within cosine similarity space. Models that reason well show intermediate representations that cluster by entity identity across diverse queries. This clustering provides a geometric explanation for when reasoning works and when it fails.

Query-level exposure is required. Second-hop generalization fails unless the model encounters the exact compositional structure during training. Single-hop knowledge does not automatically compose into multi-hop capability — a finding that helps explain why Do language models actually use their encoded knowledge?: encoding facts individually doesn't guarantee they compose.

Grokking provides parallel three-phase evidence. The "Progress Measures for Grokking via Mechanistic Interpretability" paper reverse-engineers the grokking phenomenon in transformers trained on modular addition, revealing three continuous phases that closely parallel the three developmental stages above: (1) memorization — the model fits training data quickly, (2) circuit formation — structured mechanisms gradually amplify in the weights (the generalizing circuit emerges), and (3) cleanup — memorizing components are removed. The parallel between memorization → ID generalization → cross-distribution reasoning and memorization → circuit formation → cleanup suggests a shared underlying dynamic: generalization requires extended training well beyond the point of memorization, and proceeds through the gradual formation of structured internal mechanisms. The grokking paper confirms this with a mechanistic explanation: the generalizing circuit uses discrete Fourier transforms and trigonometric identities. See What happens inside models when they suddenly generalize?.

The three-stage trajectory has implications for understanding RL-trained reasoning models. Since Do base models already contain hidden reasoning ability?, the question becomes: which stage does RL training target? If RL primarily accelerates Phase II (ID generalization), it explains why Does the choice of RL algorithm actually matter for reasoning? — different algorithms may trigger the same phase transition.

Inquiring lines that use this note as a source 24

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 5

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
17 direct connections · 189 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

implicit multi-hop reasoning in transformers emerges through three developmental stages with cosine clustering as the mechanistic signature