Does next-state prediction alone build mechanistic world models or just sophisticated interpolation?
This explores whether training a model to predict the next state of the world actually teaches it how the world works — a genuine causal model you can reason with — or whether it just gets very good at pattern-matching observed sequences without understanding the underlying structure.
This explores whether training a model to predict the next state of the world actually teaches it how the world works — a genuine causal model — or whether it just gets very good at extrapolating observed sequences. The corpus comes down hard on a distinction that the question itself hinges on: predicting accurately and understanding are not the same thing, and you can have a lot of the first with almost none of the second.
The strongest evidence for skepticism comes from probing what models *actually* learn when trained on prediction. Transformers trained on orbital mechanics or board games achieve high predictive accuracy, but when you inspect the machinery, they've built task-specific heuristics rather than a unified model of the domain — fine-tuning surfaces nonsensical, slice-dependent "laws," and circuit analysis shows arithmetic running on range-matching tricks instead of algorithms Do foundation models learn world models or task-specific shortcuts?. That's sophisticated interpolation wearing the costume of a world model. The reason this is hard to catch from the outside is methodological: representational analysis alone finds correlations, and behavioral testing alone shows effects without mechanism. You only get a real mechanistic claim by pairing them — locate a candidate feature, then causally intervene to confirm it does the work Can we understand LLM mechanisms with only representational analysis?. Most claims that "the model learned a world model" never clear that bar.
So what would separate a mechanistic world model from interpolation? The corpus's answer is counterfactuals and interventions. A useful world model isn't one that predicts the next observation well — it's one that can simulate *actionable possibilities*: what happens if I do X, what would have happened otherwise What makes a world model actually useful for reasoning?. Pushed further, the argument is that next-frame prediction is the wrong objective entirely; world models should be designed to simulate whole spaces of possibility — physical, social, counterfactual — grounded in an agent's decisions rather than passive forecasting What should a world model actually be designed to do?. By that test, prediction-only training optimizes for the wrong thing and gets exactly what it asks for: surface regularities.
Yet the corpus doesn't let prediction off cleanly as "mere interpolation," and this is the genuinely surprising part. Native language world models trained purely via next-state prediction on millions of trajectories outperform real-environment training and transfer across seven domains Can language models learn to simulate agent environments?. The catch is *what* you predict over: structure in the data does heavy lifting. In-context learning of decision-making only emerges when the context contains full trajectories from the same environment, not isolated examples — the sequential, same-level structure is the ingredient that lets prediction generalize Why do trajectories matter more than individual examples for in-context learning?. Similarly, agents that treat the consequences of their own actions as supervision learn effective models without external rewards Can agents learn from their own actions without external rewards?, and you can smuggle planning-relevant future information into the training data with lookahead tokens, no architecture change required Can embedding future information in training data improve planning?.
The synthesis, then, is that "next-state prediction alone" is underspecified — the action is in the word *alone*. Passive prediction over observations tends toward interpolation. But prediction over the right objects — trajectories, action-consequence loops, intervention outcomes — starts closing the gap, because the model is forced to represent how its own outputs become future inputs. One striking finding is that post-training produces a measurable shift from passive prediction to *enaction*: the model begins recognizing its outputs as actions that shape what it sees next, closing an action-perception loop that pure pretraining lacks Do models recognize their own outputs as actions shaping future inputs?. That loop — not scale, and not prediction in isolation — is what separates a model that understands consequences from one that just continues a pattern. Whether you ever get a fully *mechanistic* model out of it remains contested; one provocative line argues computation always presupposes an interpreting agent to give states meaning in the first place Can computation arise without a conscious mapmaker?.
Sources 10 notes
Inductive bias probes show transformers trained on orbital mechanics and games learn predictive patterns, not unified world structure. Fine-tuning reveals nonsensical, slice-dependent laws; circuit analysis shows arithmetic relies on range-matching heuristics, not algorithms.
Representational analysis alone identifies correlations without causation; causal analysis alone shows behavioral effects without explaining them. Only paired methods—locating candidate features representationally, then verifying causally—produce complete mechanistic claims.
Research shows LLMs may achieve high prediction accuracy through task-specific heuristics without developing coherent generative models of how the world works. True world models must enable reasoning about interventions and counterfactuals, not surface regularities.
Drawing on hypothetical thinking in psychology, world models are most useful when designed to simulate all actionable possibility spaces—physical, embodied, emotional, social, mental, counterfactual, and evolutionary—grounded in agent decision-making rather than passive prediction.
Qwen-AgentWorld demonstrates that native language world models trained via next-state prediction on 10M+ trajectories outperform real-environment training on three benchmarks and transfer across seven domains, positioning next-state prediction as a foundation objective for agents.
In-context learning for sequential decision-making requires full or partial trajectories from the same environment level, not just isolated examples. This structural property—trajectory burstiness—allows models to generalize across vastly different tasks without weight updates.
Research across eight environments shows that agents can use future states from their own actions as supervision without external rewards, matching expert-dependent baselines with half the data and providing superior warm-starts for subsequent RL training.
TRELAWNEY augments training data with special tokens encapsulating future information, allowing models to learn goal-conditioned generation using standard infrastructure. Results show improved planning, algorithmic reasoning, and story generation without modifying architecture or training procedures.
Post-trained language models exhibit a measurable shift where they recognize their outputs become their own future inputs, closing an action-perception loop absent in pretraining. Evidence includes 3-4x lower output entropy on-policy and behavioral signatures of trajectory recognition.
Computational systems depend on a conscious mapmaker who alphabetizes continuous physics into discrete symbols. No increase in algorithmic complexity can generate this agent; it must logically precede the computation it makes possible.