SYNTHESIS NOTE

Topics›Reasoning Logic Internal Rules›this note

How do transformers learn to reason across multiple steps?

Does multi-hop reasoning in transformers emerge through distinct learning phases, and what geometric patterns in hidden representations explain when reasoning succeeds or fails?

Synthesis note · 2026-02-22 · sourced from Reasoning Logic Internal Rules

Training transformers from scratch in a controlled symbolic environment reveals that implicit multi-hop reasoning — answering compositional queries without verbalizing intermediate steps — emerges through three distinct developmental stages:

Phase I: Memorization. The model fits training data (atomic facts and 2-hop compositions) quickly. Generalization to unseen queries remains minimal.

Phase II: In-Distribution Generalization. After memorization saturates, the model begins generalizing to unseen ID-ID compositions — a shift from memorization to compositional reasoning within the training distribution. This resembles grokking: generalization emerges well after memorization converges.

Phase III: Cross-Distribution Reasoning. The model learns to compose OOD triples in the first hop with ID triples in the second. This transition is slower than Phase II. Crucially, generalization fails consistently when the SECOND hop is from OOD triples, revealing a stronger bottleneck in the second relational step.

Two mechanistic findings deepen the picture:

Cosine clustering as signature. Successful reasoning correlates with consistent clustering of intermediate entity representations within cosine similarity space. Models that reason well show intermediate representations that cluster by entity identity across diverse queries. This clustering provides a geometric explanation for when reasoning works and when it fails.

Query-level exposure is required. Second-hop generalization fails unless the model encounters the exact compositional structure during training. Single-hop knowledge does not automatically compose into multi-hop capability — a finding that helps explain why Do language models actually use their encoded knowledge?: encoding facts individually doesn't guarantee they compose.

Grokking provides parallel three-phase evidence. The "Progress Measures for Grokking via Mechanistic Interpretability" paper reverse-engineers the grokking phenomenon in transformers trained on modular addition, revealing three continuous phases that closely parallel the three developmental stages above: (1) memorization — the model fits training data quickly, (2) circuit formation — structured mechanisms gradually amplify in the weights (the generalizing circuit emerges), and (3) cleanup — memorizing components are removed. The parallel between memorization → ID generalization → cross-distribution reasoning and memorization → circuit formation → cleanup suggests a shared underlying dynamic: generalization requires extended training well beyond the point of memorization, and proceeds through the gradual formation of structured internal mechanisms. The grokking paper confirms this with a mechanistic explanation: the generalizing circuit uses discrete Fourier transforms and trigonometric identities. See What happens inside models when they suddenly generalize?.

The three-stage trajectory has implications for understanding RL-trained reasoning models. Since Do base models already contain hidden reasoning ability?, the question becomes: which stage does RL training target? If RL primarily accelerates Phase II (ID generalization), it explains why Does the choice of RL algorithm actually matter for reasoning? — different algorithms may trigger the same phase transition.

Inquiring lines that read this note 25

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

What structural biases does transformer attention create in language model outputs?

What is selective resonance and why do transformers not perform it?

Does recurrence enable reasoning capabilities that fixed-depth transformers cannot achieve?

How do knowledge graphs enable efficient multi-hop reasoning over alternatives?

What graph structures better support multi-hop reasoning than pairwise edges?

How does reasoning graph topology affect breakthrough insights and generalization?

Why do reasoning models fail at systematic problem-solving and search?

What limits mechanistic interpretability's ability to characterize models?

Can we detect and measure circuit formation before generalization emerges?

How does example difficulty affect learning efficiency in language models?

How do transformers generate harder solutions when mostly trained on easier problems?

How do training priors constrain what context information can override?

What explains the contextual variability of knowledge in transformers?

Why do continual learning scenarios trigger catastrophic forgetting and interference?

Does grokking in modular arithmetic follow the same three-phase learning trajectory?

Does model scaling alone produce compositional generalization without symbolic mechanisms?

Can single-hop knowledge automatically compose into multi-hop capability?

What determines success in training models on multiple tasks?

How do transformers stitch together learned behaviors when adapting to new tasks?

How does sequence length affect sparsity tolerance in models?

Can sparse attention methods be designed specifically for multi-hop reasoning tasks?

Related concepts in this collection 5

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

19 direct connections · 181 in 2-hop network ·dense cluster Open in graph ↗

How do transformers learn to reason across multi… Do language models actually use their encoded know… Do base models already contain hidden reasoning ab… Does the choice of RL algorithm actually matter fo… Do reasoning cycles in hidden states reveal aha mo… Can neural networks learn compositional skills wit…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Do language models actually use their encoded knowledge? Probes can detect that LMs encode facts internally, but do those encoded facts causally influence what the model generates? This explores the gap between knowing and doing.
encoding ≠ composition; this paper shows the mechanism for when composition emerges
Do base models already contain hidden reasoning ability? Explores whether reasoning capability emerges during pre-training as a latent feature rather than being created by post-training methods like reinforcement learning or fine-tuning.
three-stage emergence framework for understanding what "unlocking" means
Does the choice of RL algorithm actually matter for reasoning? Expert Iteration, PPO, and RC-RL show similar performance on reasoning tasks. The question is whether algorithm choice drives results or whether something deeper—like the pretrained model itself—sets the real limits.
RL may target specific phase transitions in the emergence trajectory
Do reasoning cycles in hidden states reveal aha moments? What if the internal loops in model reasoning—visible in hidden-state topology—correspond to the reconsidering moments that happen during reasoning? This note explores whether graph cyclicity captures a mechanistic signature of insight.
cosine clustering is a representational-level analogue to the topological "aha moment"
Can neural networks learn compositional skills without symbolic mechanisms? Do neural networks need explicit symbolic architecture to compose learned concepts, or can scaling alone enable compositional generalization? This asks whether compositionality is an architectural feature or an emergent property of scale.
shared condition: both findings show compositional reasoning requires training exposure to the compositional structure, not just individual components; query-level exposure (this note) and task-space coverage (that note) are the same constraint at different scales

How do transformers learn to reason across multiple steps?

Inquiring lines that read this note 25

Related concepts in this collection 5

Related papers in this collection 8

Search by related questions 4