Fixed-Point Reasoners: Stable and Adaptive Deep Looped Transformers

Paper · arXiv 2606.18206 · Published June 16, 2026
Looped Models

Looped architectures provide an inductive bias toward learning step-by-step procedures for tasks that require compositional reasoning. The depth of effective layers reached by looping determines the quality of the solution these models find. Similar to deep architectures, looped architectures are prone to a signal propagation problem induced by depth as the halting decision is postponed. In this paper, we address the signal propagation issue by using pre-norm layers and residual scaling. Building on these architectural modifications, we propose FPRM: a Transformer-based Fixed-Point Reasoning Model that uses fixed-point convergence as an end-to-end halting mechanism in a looped architecture. We show that fixed-point halting allows FPRM to adapt its compute to the difficulty of the task. FPRM proves effective on common reasoning benchmarks, namely Sudoku, Maze, state-tracking and ARC-AGI. The implementation can be found here.

Introduction. Reasoning in neural networks has increasingly been framed as a problem of scaling test-time compute: a model should be able to spend more computation on inputs it finds harder (OpenAI, 2024; Snell et al., 2024). However, doing so requires two ingredients. (1) flexibility: the possibility of spending a variable amount of compute on the problem. Once the model is flexible, the next step is (2) adaptivity: a way to scale the compute spent on the problem; i.e., when to halt the computation. The standard way to achieve both is through a Chainof-Thought (CoT) mechanism (Wei et al., 2022). With CoT, the model scales compute through verbalization, and makes halting decisions based on predicting a specialized halting token. However, this emerging behavior requires a special training regime and hand-crafted reasoning traces (Guo et al., 2025). This makes the method complex and undermines the desirable property of endto-end training.

Discussion / Conclusion. Our experiments support three broader observations about FPRM and looped reasoning models in general, which we discuss in turn. Looped fixed-point models are adaptive. FPRM adapts to the difficulty of the problem more effectively compared to TRM (Figures 4, 5, and Section 4.3), using fewer effective layers (compute), while achieving better performance. This is a consequence of FPRM halting closer to the saturation point of accuracy (Figure 12). In contrast, TRM with its ACT halting mechanism either halts too early, resulting in lower performance, or too late, using excessive compute. Scaling behavior of FPRM. The results of Section 4 combine into a coherent picture of how FPRM scales its computation. First, with better signal propagation FPRM is able to utilize compute more efficiently (Figure 6). Second, as more difficult problems require more compute (Merrill et al., 2024; Movahedi et al., 2025), better test-time scaling of FPRM is mostly visible in harder tasks (Figures 4, 5a).