SYNTHESIS NOTE
Training, RL, and Test-Time Scaling

Can inference compute replace scaling up model size?

Explores whether smaller models given more thinking time during inference can match larger models. Matters because it reshapes deployment economics and compute allocation strategies.

Synthesis note · 2026-02-20 · sourced from Test Time Compute
How should we allocate compute budget at inference time?

Snell et al. (2024) demonstrated that allowing a model a fixed but non-trivial amount of inference-time compute can be more effective than scaling model parameters — at least on hard prompts. This suggests pretraining and inference compute are not fully independent: they trade off against each other.

The practical implication matters for deployment economics. Running a smaller model with more inference compute may be capability-equivalent to a larger model running with less. Inference is elastic (adjustable per query); pretraining is a sunk cost. This creates a new optimization lever that didn't exist when compute budgets only lived in training.

However, the substitution has limits. Base model capabilities set a floor — inference compute can extend performance within the model's existing capability frontier, but cannot create capabilities the model lacks entirely. See Can non-reasoning models catch up with more compute? for evidence of where this limit becomes visible.

Inquiring lines that use this note as a source 80

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 5

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
23 direct connections · 178 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

test-time compute can substitute for model parameter scaling on hard prompts