Can budget-tightening curricula improve reasoning efficiency more than fixed budgets?
This explores whether training a model on a schedule of shrinking token budgets (generous first, then progressively tighter) buys better reasoning efficiency than just training under one fixed budget — and why that staging helps.
This explores whether a budget-tightening curriculum beats a fixed budget for reasoning efficiency. The corpus answers directly: yes, and the reason is that learning to reason well and learning to reason cheaply are two different jobs. Models trained with progressively tightening token budgets reach higher accuracy *and* better token efficiency than fixed-budget baselines, because the curriculum splits training into an exploration phase (discover strategies while budgets are generous) and a compression phase (distill those strategies once the budget clamps down) — see Does gradually tightening token budgets beat fixed budget training?. A fixed budget forces both jobs to happen at once, and that's the disadvantage.
Why does compressing late work at all? Because more thinking is not free upside. Accuracy is non-monotonic in thinking length: pushing one model from ~1,100 to ~16K thinking tokens dropped accuracy from 87.3% to 70.3%, as it overthought easy problems and underthought hard ones Does more thinking time always improve reasoning accuracy?. So there's genuine slack to cut — a tightening curriculum is exploiting the fact that the generous-budget version was partly wasting tokens, not using them.
The more interesting question is whether the efficiency comes from the *budget schedule itself* or from training structure more broadly — and the corpus leans toward the latter. Reasoning models keep beating non-reasoning ones at any inference budget because training installs a protocol that makes extra tokens productive; the gap is about how reasoning was trained in, not raw compute at deploy time Can non-reasoning models catch up with more compute?. In the same spirit, RL training flips extended thinking from counterproductive self-doubt into useful gap-analysis — training mediates the *quality* of reasoning, not just its quantity Does extended thinking help or hurt model reasoning?. A budget curriculum is one lever within that broader truth: it's shaping when and how the model learns to spend, not adding capability.
There's a cheaper rival worth knowing about. If you only want brevity, you may not need a curriculum — or any retraining — at all. Verbose versus concise chains of thought turn out to occupy distinct linear regions of activation space, and a single steering vector extracted from 50 examples cut chain-of-thought length 67% with a 2.73x speedup and no accuracy loss Can we steer reasoning toward brevity without retraining?. That reframes the original question: a tightening curriculum earns its cost when you want the model to genuinely *learn* a more efficient reasoning policy, whereas inference-time steering buys compression off the shelf when you just want shorter output now.
One caution the corpus adds: efficiency gains measured on final accuracy can hide reasoning damage. Supervised fine-tuning raised benchmark scores while cutting the quality of intermediate inferential steps by 38.9%, producing right answers via post-hoc rationalization that standard metrics miss Does supervised fine-tuning improve reasoning or just answers?. So if you adopt budget-tightening, the success test isn't just "same accuracy, fewer tokens" — it's whether the compressed reasoning is still doing real inferential work underneath.
Sources 6 notes
Models trained with progressively tightening token budgets consistently achieve higher accuracy and better token efficiency than fixed-budget baselines. The approach works by separating learning into exploration (discovering strategies with generous budgets) and compression (distilling them under constraints).
Increasing thinking tokens from ~1,100 to ~16K reduced benchmark accuracy from 87.3% to 70.3%, revealing a non-monotonic relationship where models overthink easy problems and underthink hard ones.
Reasoning models persistently outperform non-reasoning models regardless of inference budget because training instills a reasoning protocol that makes additional tokens productive. The gap is fundamentally about deployment mechanisms and training structure, not raw capability.
Vanilla models use thinking mode counterproductively, inducing self-doubt that degrades performance. RL training reverses this, transforming the same mechanism into beneficial gap analysis. Training mediates reasoning quality, not just quantity.
Activation-Steered Compression extracts a single vector from 50 paired examples to reduce chain-of-thought length by 67% while maintaining accuracy and achieving 2.73x speedup. The method is training-free and generalizes across model sizes and domains.
Supervised fine-tuning improves final-answer accuracy on benchmarks but cuts Information Gain by 38.9 percent, meaning models generate correct answers through post-hoc rationalization rather than genuine inferential steps. Standard metrics miss this degradation because they only measure final correctness.