TOPIC

Looped Models

10 synthesis notes · 12 source papers

View as

Can tiny recursive networks outperform massive language models?

Can a small network that recursively refines its reasoning on a latent state match or beat billion-parameter LLMs on hard reasoning puzzles? This challenges assumptions about scale and hierarchy in AI reasoning.

Can fixed points replace learned halt tokens in reasoning models?

Does stopping inference when a looped transformer's internal state stabilizes provide a better halting signal than training a dedicated token predictor? This matters for building adaptive compute without expensive special training.

Can looped computation replace parameter count in world models?

Does iteratively refining latent states through a shared transformer block achieve comparable performance to larger models while adapting computation depth per prediction step? This matters because world models struggle with long-horizon rollout error and computational cost.

Can reasoning be learned during pretraining rather than after?

Does building iterative computation into the pretraining phase itself allow language models to develop reasoning before post-hoc fine-tuning? And if so, does latent reasoning align better with outputs than explicit chain-of-thought?

How do looped language models actually improve reasoning in depth?

Mechanistic analysis investigates whether looping transformer layers creates genuinely new computation or reuses existing inferential stages. Understanding this distinction clarifies why recurrent depth can match standard scaling.

Does adding more loops always improve looped language models?

Conventional wisdom treats loop count as a dial: more loops should mean better reasoning. But does the empirical evidence support monotonic gains, or is there a point where additional loops become counterproductive?

Can stochastic latent reasoning let models explore multiple solutions?

When recursive reasoning models collapse to single deterministic paths, can introducing stochasticity into latent transitions instead let them maintain uncertainty and consider alternative strategies? This matters because real problems often have multiple valid answers.

Source papers 12

The Arxiv papers behind this sub-topic. Links may take you off-site to arxiv.org.

A Mechanistic Analysis of Looped Reasoning Language Models
Reasoning has become a central capability in large language models. Recent research has shown that reasoning performance can be improved by looping an LLM’s layers in the latent dimension, resulting i…
Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?
The remarkable capability of Transformers to do reasoning and few-shot learning, without any fine-tuning, is widely conjectured to stem from their ability to implicitly simulate a multi-step algorithm…
Fixed-Point Reasoners: Stable and Adaptive Deep Looped Transformers
Looped architectures provide an inductive bias toward learning step-by-step procedures for tasks that require compositional reasoning. The depth of effective layers reached by looping determines the q…
Generative Recursive Reasoning
How should future neural reasoning systems implement extended computation? Recursive Reasoning Models (RRMs) offer a promising alternative to autoregressive sequence extension by performing iterative …
Less is More: Recursive Reasoning with Tiny Networks
Hierarchical Reasoning Model (HRM) is a novel approach using two small neural networks recursing at different frequencies. This biologically inspired method beats Large Language models (LLMs) on hard …
Loop, Think, & Generalize: Implicit Reasoning in Recurrent-Depth Transformers
Large language models (LLMs) (Brown et al., 2020) are known to acquire substantial factual knowledge during pretraining, storing it in their parameters (Geva et al., 2023). However, how effectively th…
LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling
Looped Transformers scale latent computation by repeatedly applying shared blocks, but sequential looping increases latency and KV-cache memory with the loop count. Parallel loop Transformers (PLT) al…
Looped Diffusion Language Models
Masked diffusion models (MDMs) have emerged as a promising alternative to autoregressive models for language modeling, yet the effective design of transformer architectures for MDMs remains underexplo…
Looped World Models
Current world models face a fundamental tension: faithful long-horizon simulation demands deep computation, but deeper models are expensive to deploy and prone to compounding errors. We resolve this b…
Pushdown Layers: Encoding Recursive Structure in Transformer Language Models
Recursion is a prominent feature of human language, and fundamentally challenging for self-attention due to the lack of an explicit recursive-state tracking mechanism. Consequently, Transformer langua…
Recursive Language Models
We study allowing large language models (LLMs) to process arbitrarily long prompts through the lens of inference-time scaling. We propose Recursive Language Models (RLMs), a general inference strategy…
Scaling Latent Reasoning via Looped Language Models
Abstract Modern LLMs are trained to “think” primarily via explicit text generation, such as chain-of-thought (CoT), which defers reasoning to post-training and under-leverages pre-training data. We pr…