SYNTHESIS NOTE
Model Architecture and Internals Training, RL, and Test-Time Scaling

Can tiny recursive networks outperform massive language models?

Does a small network that refines its reasoning through recursion on a latent state actually generalize better than billion-parameter LLMs on hard puzzles like ARC-AGI? What makes recursion more powerful than scale?

Synthesis note · 2026-06-03 · sourced from Reasoning Architectures

Autoregressive LLMs are fragile on hard puzzles because a single wrong token can invalidate an answer, and the usual patches — chain-of-thought and test-time compute — are expensive, data-hungry, and brittle. The Tiny Recursive Model (TRM) takes the opposite bet: a single 2-layer network with only 7M parameters that recurses on its own latent reasoning feature and progressively improves its final answer. It reaches 45% on ARC-AGI-1 and 8% on ARC-AGI-2 — higher than most LLMs including DeepSeek R1, o3-mini, and Gemini 2.5 Pro — with less than 0.01% of their parameters.

The keeper is what TRM removes relative to its predecessor HRM: no fixed-point theorem, no biological hierarchy, no two interacting networks, no extra halting forward pass. A single tiny network recursing beats the hierarchical version, which isolates recursion on a latent state — not scale, not hierarchy — as the source of generalization. (The authors are candid that no single choice is universally optimal: replacing self-attention with an MLP helped Sudoku but hurt other tasks, so architecture still needs per-problem tuning and scaling laws.)

This sharpens the vault's recurrence cluster. TRM directly simplifies Can recurrent hierarchies achieve reasoning that transformers cannot? (HRM), and it agrees mechanistically with How do looped transformer layers actually behave during inference?: recursion re-applies computation on a latent state, and that reuse — at tiny scale — is what generalizes.

Inquiring lines that use this note as a source 6

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
12 direct connections · 97 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

a tiny two-layer network recursing on its latent reasoning state out-generalizes billion-parameter LLMs on hard puzzles — recursion not scale or hierarchy drives the gain