SYNTHESIS NOTE

Can reasoning systems scale faster by exploring parallel paths instead?

Current recursive reasoning models refine a single latent trajectory deeply, which is slow. Could sampling multiple trajectories in parallel achieve better reasoning with lower latency, and would that scale differently than serial refinement?

Synthesis note · 2026-05-28 · sourced from Looped Models

Recursive Reasoning Models (RRMs) increase reasoning capability by iterating a shared transition function over a latent state — more iterations means more "thinking" without extending the output sequence. This is depth scaling, and it decouples reasoning depth from both parameter count and output length. GRAM (Generative Recursive reAsoning Models) argues this is only half the story: depth alone is insufficient because a single refinement path can become trapped in a suboptimal trajectory, and many problems have ambiguity or multiple valid solutions that a single converging path cannot represent.

The structural claim is that future recursive reasoners should be not only deep (repeated refinement) but also wide (maintaining and exploring multiple latent trajectories in parallel). GRAM operationalizes width by turning the latent transition stochastic and sampling several trajectories simultaneously. Crucially, width sidesteps the latency penalty that depth-only scaling incurs: sampling N trajectories runs in parallel, whereas adding N refinement steps is serial and accumulates wall-clock time.

This reframes the inference-scaling design space for latent architectures. It mirrors at the latent-state level what parallel-vs-sequential debates established at the token level — since Why does parallel reasoning outperform single chain thinking?, breadth often beats depth under a fixed budget because independent paths sample the solution distribution rather than inflating variance along one path. GRAM brings that lesson inside the recurrent block, where prior work like Can models reason without generating visible thinking tokens? had only scaled depth. The counterpoint to watch: since Can parallel architectures solve inherently sequential problems?, width cannot substitute for depth on inherently serial problems — the two axes are complements, not interchangeable knobs. Why it matters: it gives latent reasoning a second, latency-cheap scaling dimension and explains why deterministic RRMs underperform on multi-solution tasks.

Inquiring lines that read this note 141

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

Can reasoning systems scale faster by exploring parallel paths instead?

Inquiring lines that read this note 141

Related concepts in this collection 4

Related papers in this collection 8

Search by related questions 4