INQUIRING LINE

Model Architecture and Internals · Training, RL, and Test-Time Scaling · Reasoning, Retrieval, and Evaluationcross-cluster

How stable are the fixed points in recurrent transformer blocks?

This explores whether 'looped' or recurrent transformer blocks — ones that feed their own latent state back through the same layers repeatedly — actually settle into a stable resting state, and whether that settling is reliable enough to build on.

This explores whether recurrent transformer blocks reach a stable fixed point — a latent state that stops changing when you keep running the loop — and how dependable that settling is. The corpus suggests the answer is encouraging but conditional: fixed points are stable enough to be useful as a control signal, yet the recurrence that produces them is doing real computational work, not just relaxing into a resting state.

The strongest direct evidence comes from work on adaptive halting. Rather than training a model to emit a special 'stop' token, you can watch the looped transformer's latent state and stop when it converges — when successive iterations barely change it Can fixed points replace learned halt tokens in reasoning models?. The fact that this works *better* than a learned halt token tells you something about stability: the fixed point arrives close to where accuracy saturates, so convergence is a trustworthy proxy for 'the model is done thinking.' Stability here is practical, not just theoretical — you can calibrate compute by it without special training.

But stability isn't the same as triviality. The Hierarchical Reasoning Model shows why the recurrence matters: it runs two coupled loops at different speeds — slow abstract planning, fast detailed computation — and that nested recurrence is what lets a 27M-parameter model solve Sudoku and mazes that defeat much larger chain-of-thought systems Can recurrent hierarchies achieve reasoning that transformers cannot?. The interesting tension: the model has to keep moving through intermediate states to escape the fixed-depth complexity ceiling, yet the whole scheme depends on those latent dynamics being controllable. The fixed point is the destination; the trajectory toward it is where the reasoning lives.

A related angle is what happens when a transformer attends to its *own* latents as a feedback loop. TransformerFAM adds exactly this kind of recurrence and finds it fosters emergent working memory across indefinitely long inputs — with no extra weights Can models learn working memory by attending to their own latents?. That this stays well-behaved at 1B, 8B, and 24B scales is itself an empirical stability result: feeding a model its own state back in doesn't blow up, it accumulates into something usefully memory-like. And self-improving transformers show recurrence can be stable across rounds of self-training too, improving exponentially without saturating Can transformers improve exponentially by learning from their own correct solutions?.

The quieter caution comes from work on what transformers actually learn. Compositional reasoning often reduces to matching memorized subgraphs rather than running a stable algorithm, with errors compounding across steps Do transformers actually learn systematic compositional reasoning? — and foundation models lean on slice-dependent heuristics rather than unified world models Do foundation models learn world models or task-specific shortcuts?. The implication for fixed points: a loop can converge cleanly and still converge to the wrong answer on out-of-distribution inputs. Stability of the dynamics and correctness of the destination are separate questions — the latent state may settle confidently onto a heuristic that doesn't generalize.

Sources 6 notes

Can fixed points replace learned halt tokens in reasoning models?

FPRM shows that looped transformers halt more accurately by detecting when their latent state reaches a fixed point, calibrating compute closer to the accuracy-saturation point than learned halt tokens without requiring special training regimes.

Can recurrent hierarchies achieve reasoning that transformers cannot?

The Hierarchical Reasoning Model couples slow abstract planning with fast detailed computation across two timescales, achieving near-perfect performance on Sudoku and mazes where chain-of-thought methods fail completely. With only 27M parameters and 1,000 samples, HRM escapes the AC0/TC0 complexity ceiling that constrains fixed-depth transformers.

Can models learn working memory by attending to their own latents?

TransformerFAM demonstrates that adding a feedback loop lets transformers attend to their own latent representations, fostering emergent working memory for indefinitely long inputs. The approach requires no additional weights and improves long-context performance at 1B, 8B, and 24B scales.

Can transformers improve exponentially by learning from their own correct solutions?

Standard transformers generalize from 10-digit to 100-digit addition by repeatedly generating solutions, filtering for correctness, and retraining—showing exponential (not linear) out-of-distribution improvement across rounds without saturation.

Do transformers actually learn systematic compositional reasoning?

Research shows transformers succeed on in-distribution tasks by memorizing computation subgraphs from training data, not by learning systematic rules. They fail drastically on novel compositions, with errors compounding across reasoning steps.

Do foundation models learn world models or task-specific shortcuts?

Inductive bias probes show transformers trained on orbital mechanics and games learn predictive patterns, not unified world structure. Fine-tuning reveals nonsensical, slice-dependent laws; circuit analysis shows arithmetic relies on range-matching heuristics, not algorithms.

How stable are the fixed points in recurrent transformer blocks?

Sources 6 notes

Next inquiring lines