What role does curriculum design play in reasoning emergence?
This reads 'curriculum design' broadly — the choice of what to train on and in what order to make reasoning appear — and asks whether sequencing actually builds reasoning, so I'm reading 'emergence' as the live debate over whether training creates reasoning or merely surfaces it.
This explores whether the order and content of training material is what makes reasoning *emerge* — and the corpus's most surprising answer is that, for the most part, it doesn't create reasoning at all. A cluster of notes argues reasoning is already latent in base models before any curriculum touches them: minimal interventions like RL steering, critique fine-tuning, or decoding changes all elicit the same pre-existing capability Do base models already contain hidden reasoning ability?, and RL post-training appears to teach a model *when* to reason rather than *how*, with hybrid models recovering 91% of gains just by routing tokens Does RL post-training create reasoning or just deploy it?. If that's right, the job of a curriculum shifts from skill-building to deployment-timing — closer to the decoupled, activation-then-execution architecture argued for in How should reasoning systems actually be architected?.
So where does curriculum still matter? The strongest case is upstream, in pretraining. An analysis of five million pretraining documents found that reasoning generalizes from broad, diverse *procedural* knowledge — worked examples, methods, derivations — while factual recall leans on narrow document-specific memorization Does procedural knowledge drive reasoning more than factual retrieval?. The implication flips the usual intuition: the 'curriculum' that produces reasoning isn't a tidy easy-to-hard ladder, it's *coverage and diversity* of procedure. What you expose the model to matters more than the sequence you expose it in.
The corpus also quietly undermines the central premise of classic curriculum design — that complexity should ramp gradually. One note shows reasoning models don't break at complexity thresholds at all; they break at instance-level *novelty*, succeeding on any chain length if they've seen similar instances Do language models fail at reasoning due to complexity or novelty?. That dovetails with the finding that chain-of-thought is largely constrained imitation of familiar reasoning forms, degrading predictably under distribution shift Does chain-of-thought reasoning reveal genuine inference or pattern matching?. Read together, they suggest a curriculum's real lever is instance-space *coverage*, not difficulty progression — you're inoculating against unfamiliarity, not climbing a complexity gradient.
There are real limits to what coverage can buy, though. Even well-trained reasoning models wander unsystematically, with success dropping exponentially as problems deepen Why do reasoning LLMs fail at deeper problem solving? — a failure no amount of instance exposure obviously fixes. And there are whole modes a conventional problem-solving curriculum never touches: combinational, exploratory, and transformational *creative* reasoning go completely unaddressed by existing methods Can LLMs reason creatively beyond conventional problem-solving?. Meanwhile, training the same skill can actively cost you elsewhere — reasoning lives in higher network layers and knowledge in lower ones, which is why reasoning-heavy training improves math but can degrade medical tasks Why does reasoning training help math but hurt medical tasks?.
The thread you didn't know you were pulling: across these notes, 'curriculum' quietly splits into two different jobs. In *pretraining* it's about the diversity of procedural exposure that makes generalizable reasoning possible at all; in *post-training* it's about teaching a model when to fire reasoning it already has, plus covering enough instance variety to survive novelty. Almost none of the corpus supports the folk model of a difficulty ladder that grows capability rung by rung — which means if your goal is reasoning emergence, you may be designing the wrong axis.
Sources 9 notes
Five independent mechanisms—RL steering, critique fine-tuning, decoding changes, SAE feature steering, and RLVR—all elicit reasoning already present in base model activations. Post-training selects rather than creates reasoning; the bottleneck is elicitation, not capability acquisition.
Evidence shows base models already contain reasoning capability in latent form; RL training optimizes deployment timing rather than capability creation. Hybrid models recover 91% of performance gains by routing tokens only, and activation vectors for reasoning strategies pre-exist before any RL.
Research shows RL post-training teaches models *when* to use reasoning mechanisms that pre-training already provides. Decoupled architectures, latent reasoning in continuous space, and interleaved action-grounding all outperform monolithic chain-of-thought approaches.
Analysis of 5 million pretraining documents shows reasoning relies on broad, transferable procedural knowledge from diverse sources, unlike factual recall which depends on narrow, document-specific memorization of target facts.
LRMs don't break at complexity thresholds but at instance-novelty boundaries. Models fit instance-based patterns rather than generalizable algorithms, so any reasoning chain succeeds if trained on similar instances, regardless of length.
CoT works by constraining models to reproduce familiar reasoning patterns from training, not by enabling novel symbolic reasoning. Performance degrades predictably under distribution shifts—the signature of imitation rather than capability emergence.
Current reasoning models lack the three properties of systematic exploration: validity, effectiveness, and necessity. This causes success probability to drop exponentially with problem depth, making medium problems solvable but deep problems catastrophically harder.
Research identifies combinational, exploratory, and transformational reasoning as distinct creative modes grounded in cognitive science. Existing LLM reasoning methods address only conventional problem-solving, leaving creative paradigms unaddressed and potentially explaining diversity collapse in ideation.
Two-phase inference model shows knowledge retrieval operates in lower network layers while reasoning adjustment happens in higher layers. This separation explains why reasoning training improves math but can degrade knowledge-intensive domains like medicine.