SYNTHESIS NOTE

Should extra compute refine one model or build many?

When a single model stops improving after multiple training epochs, is it better to keep refining that model or spend compute building a diverse population of models whose predictions aggregate better?

Synthesis note · 2026-06-27 · sourced from Reasoning Critiques

Compute is now growing faster than the supply of high-quality text, which forces a question the field used to dodge: once you have re-passed your fixed corpus enough times that a single model stops improving, what do you do with the remaining budget? The instinct is to keep refining that one model. q0 argues that instinct is wrong because a single model saturates within a few epochs — further passes hit diminishing returns long before the budget is spent. The alternative is to spend the surplus compute building a population of diverse models and aggregating their predictions, which reaches a lower validation loss than any single refined model.

The conceptual move is from optimization toward something closer to Bayesian model averaging: it grounds the design in Solomonoff induction's idea that you should weight many hypotheses rather than commit to one. q0 reduces this to three primitives — a cyclic learning-rate/weight-decay schedule that anti-correlates the two to collect diverse snapshots, chain distillation so each model trains against its predecessor and quality compounds, and a learned prior that selects and weights members for any inference budget. The headline is efficiency: matching a 256-epoch ensemble with ~56 epochs.

This connects to a recurring pattern in the vault — that diversity is an objective worth optimizing, not a side effect. Since Should training maximize diversity when models feed into search?, q0 is the pretraining-time analogue of that test-time argument: both say that converging on one best answer is the wrong target when downstream you will be aggregating or searching.

The honest limitation is inference cost: K snapshots means K forward passes, which the authors concede is prohibitive in many deployments. They note the ensemble can be distilled back into a single student — which means the practical payoff may ultimately route through distillation, and the "population vs. single model" framing is a training-time stance, not necessarily a deployment-time one. The diversity-vs-refinement tradeoff is real, but whether it survives the distillation step is the open question.

Inquiring lines that use this note as a source 2

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

13 direct connections · 128 in 2-hop network ·dense cluster Open in graph ↗

Should extra compute refine one model or build m… Should training maximize diversity when models fee… Can we prune training data without hurting model p… Do critique models improve diversity during traini…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Should training maximize diversity when models feed into search? If a model runs inside a test-time search loop that samples many rollouts and picks the best, does training for entropy and diversity unlock better solutions than training for a single sharp answer?
convergent-with: the same diversity-over-convergence principle, applied at pretraining rather than test time
Can we prune training data without hurting model performance? This explores whether difficulty metrics can identify redundant training examples that can be safely removed. It matters because most datasets contain massive waste — if we can find which examples are truly necessary, we could train better models on far less data.
convergent-with: both attack the data-constrained regime by changing how a fixed corpus is used rather than adding data
Do critique models improve diversity during training itself? Explores whether critique integrated into the training loop, beyond test-time scoring, actively maintains solution diversity and prevents the model from converging too narrowly during iterative self-training.
grounds the diversity claim: another mechanism (critique) that counteracts the tail-narrowing q0 avoids by populating distinct trajectories

Should extra compute refine one model or build many?

Related concepts in this collection 3

Related papers in this collection 8

Search by related questions 4