SYNTHESIS NOTE

What five design choices compose a world model?

World models are often presented as monolithic systems, but they actually involve five distinct design decisions—data preparation, representation, reasoning architecture, training objective, and decision integration—that can each fail independently. Understanding this decomposition helps diagnose why world model proposals fall short.

Synthesis note · 2026-05-03 · sourced from World Models

World model proposals often present themselves as monolithic — a video generator, a latent dynamics model, a foundation model. The Critiques of World Models essay argues this hides a structural fact: a world model is a composition of five distinct design choices, and any of them can be misaligned with the others. Treating the WM as a single thing makes it impossible to diagnose why it fails, because the failure could lie at any of the five layers — a decomposition that resolves the ambiguity flagged in Do LLMs actually have world models or just facts?.

The five aspects: (1) Identifying and preparing training data with the desired world information — what observations does the model see, and do they actually contain the structure needed for the intended downstream tasks? (2) Adopting a general representation space for the latent world state with possibly richer meaning than the observation data in plain sight — does the latent representation expose the right invariances for reasoning, or does it merely reconstruct the input? (3) Designing an architecture that allows effective reasoning over the representations — does the model support compositional, counterfactual, hierarchical operations, or only single-step prediction? (4) Choosing an objective that properly guides the model training — does the loss target the simulation-of-possibilities goal, or does it reward only observation reconstruction? (5) Determining how to use the world model in a decision-making system — how do the outputs of the WM feed into action selection, planning, or policy?

A WM that nails one or two of these and fails on the others is a coherent kind of failure: a video generator with stunning reconstruction quality (1, 2, 4) but no architecture for counterfactual queries (3) and no integration with decision-making (5) is not a world model in the functional sense, however impressive its outputs. Conversely, a model with rich representations but poor data coverage cannot simulate what its data did not expose.

The design pattern this exposes: when evaluating a proposal claiming to be a world model, decompose the claim into the five aspects and check each. Most of the disagreement in the WM literature is about which aspects matter and how they should be ordered, not about whether to build a WM at all. The five-aspect frame makes those disagreements explicit rather than letting them remain folded into vague terminology.

Inquiring lines that read this note 6

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

What capability tradeoffs emerge when scaling model reasoning abilities?

Why do foundation models develop heuristics instead of world models?

What are the consequences of models training on synthetic data?

Can a world model have rich representations without adequate data coverage?

Do language models develop causal world models or rely on statistical patterns?

Can AI-generated outputs constitute genuine knowledge or valid claims?

What's the difference between representing world facts and generating world mechanisms?

How should planning and perception grounding be factored in agent design?

What are the five inseparable design choices when building world models?

Related concepts in this collection 5

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

13 direct connections · 101 in 2-hop network ·medium cluster Open in graph ↗

What five design choices compose a world model? What should a world model actually be designed to … Do LLMs actually have world models or just facts? Can language models simulate belief change in peop… Can we measure reasoning quality beyond output pla… Can identical outputs hide broken internal represe…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

What should a world model actually be designed to do? Current AI research treats world models as either video predictors or RL dynamics learners, but what if their real purpose is simulating actionable possibilities for decision-making rather than predicting next observations?
extends: companion piece — the goal definition picks aspect 5 (decision integration) and aspect 4 (objective) as primary
Do LLMs actually have world models or just facts? The term 'world model' conflates two different capabilities: factual representation versus mechanistic understanding. Understanding which one LLMs actually possess matters for assessing their reasoning reliability.
complements: the five-aspect frame disambiguates the WM term that this earlier insight argued was conflated
Can language models simulate belief change in people? Current LLM social simulators treat behavior as input-output mappings without modeling internal belief formation or revision. Can they be redesigned to actually track how people think and change their minds?
exemplifies: behaviorist social-sim agents fail aspect 2 (representation) and aspect 3 (architecture for counterfactuals) — concrete instance of the misalignment the framework predicts
Can we measure reasoning quality beyond output plausibility? How might we evaluate whether AI systems reason internally like humans do, rather than just producing human-like outputs? This matters because surface coherence can mask broken underlying reasoning.
exemplifies: RECAP operationalizes aspect 4 (objective) as something measurable rather than vague
Can identical outputs hide broken internal representations? Can neural networks produce correct outputs while having fundamentally fractured internal structure that prevents generalization and creativity? This challenges our assumptions about what performance benchmarks actually measure.
extends: aspect 2 (representation) failure mode — strong outputs from broken latents are exactly what the five-aspect decomposition exposes

What five design choices compose a world model?

Inquiring lines that read this note 6

Related concepts in this collection 5

Related papers in this collection 8

Search by related questions 4