What makes fixed-point convergence better than learned halt tokens?
This explores why a model can better decide *when to stop thinking* by watching its own internal state settle into a stable point (fixed-point convergence) than by training it to emit a special 'I'm done' token (a learned halt token).
This explores why fixed-point convergence beats learned halt tokens as a way for a reasoning model to decide when it has thought enough. The core result comes from looped transformers that re-apply the same layers over and over: instead of training a separate signal that says "stop now," the model simply watches whether its latent state stops changing between iterations. When the state reaches a fixed point — successive passes barely move it — that's the natural cue that more compute won't buy more accuracy, and it tracks the accuracy-saturation point more closely than a trained halt token does, with no special training regime required Can fixed points replace learned halt tokens in reasoning models?.
The deeper reason this works is architectural. Looped models gain their reasoning power precisely from iterating in depth rather than growing wider, and recursion gives them state-tracking and compositional generalization that scaling parameters alone can't Can models learn by looping instead of growing larger?. Because the computation *is* an iterative refinement of a latent state, convergence of that state is an intrinsic, free signal — it falls out of the mechanism. A learned halt token, by contrast, is a bolt-on: it asks the model to *predict* when it should be done, which is a separate skill that has to be trained and calibrated, and can drift from the actual point where the answer stabilizes.
What makes this more than a tuning trick is a cautionary counterpoint in the corpus: LLMs are surprisingly bad at genuinely *running* iterative procedures in latent space. Asked to perform iterative numerical methods, they tend to pattern-match a memorized template and emit a plausible-but-wrong value rather than actually iterate, and this persists across scale Do large language models actually perform iterative optimization?. So fixed-point halting isn't valuable just because it's elegant — it's valuable because it forces a model to *demonstrate* convergence in its state rather than declare completion from a learned habit. A halt token can fire on the basis of surface familiarity; a fixed point can't be faked as easily, because it requires the iteration to actually settle.
There's a wider pattern here worth noticing: across the corpus, signals and behaviors that emerge from a model's own internal dynamics tend to be more trustworthy than ones grafted on by training. RL fine-tuning often sharpens template-matching rather than installing real procedures, so trained signals can encode memorization instead of competence Do fine-tuned language models actually learn optimization procedures?; and on genuine optimization tasks LLMs plateau around 55–60% regardless of scale, which suggests added training capacity doesn't automatically buy real iterative reasoning Do larger language models solve constrained optimization better?. Against that backdrop, fixed-point convergence reads as part of a broader bet — that the most reliable way to know a model is finished is to read its mechanism, not to teach it to announce itself.
If you want to follow this thread further, the most interesting tension is between architectures that recur in latent space and the things autoregressive token-by-token generation structurally *can't* do — like retracting a committed token during constraint solving Why does autoregressive generation fail at constraint satisfaction?. Halting is one place where computing in latent state quietly outperforms computing in emitted tokens.
Sources 6 notes
FPRM shows that looped transformers halt more accurately by detecting when their latent state reaches a fixed point, calibrating compute closer to the accuracy-saturation point than learned halt tokens without requiring special training regimes.
Models that re-apply layers in recurrent depth outperform larger feedforward networks on reasoning tasks. This works because recursion enables state tracking and compositional generalization that parameter scaling alone cannot achieve, with convergence signals providing natural halting.
Research shows LLMs cannot perform iterative procedures in latent space. They recognize optimization problems as template-similar and emit plausible-looking but incorrect values, a failure mode that persists across model scale and training approaches.
Even GRPO-trained models show sharp performance drops on out-of-distribution variants (N-1 test sets) compared to in-distribution problems, indicating RL optimizes template-matching rather than genuine problem-solving procedures.
Across constrained-optimization tasks, LLMs converge to ~55–60% constraint satisfaction independent of architecture, parameter count, or training regime. Reasoning models do not systematically outperform standard models, suggesting a fundamental ceiling rather than a scaling gap.
The performance ceiling on constraint satisfaction problems is not a model-quality issue but an architectural limitation: autoregressive transformers cannot retract emitted tokens, while CSP solvers fundamentally depend on discarding invalid partial assignments. Symbolic solver integration works because it supplies what the architecture lacks.