Looped World Models
Current world models face a fundamental tension: faithful long-horizon simulation demands deep computation, but deeper models are expensive to deploy and prone to compounding errors. We resolve this by introducing Looped World Models (LoopWM), which are the first looped architectures for world modelling. Our method iteratively refines latent environment states through a parameter-shared transformer block. This yield up to 100× parameter efficiency over conventional approaches with adaptive computation that automatically scales depth to match the complexity of each prediction step. Orthogonal to scaling model size and training data, LoopWM establishes iterative latent depth as a new scaling axis for world simulation, which might significantly push the community forward.
Introduction. World models (WM) learn to predict how an environment evolves in accordance with actions. WM has become a cornerstone of sample-efficient reinforcement learning and embodied intelligence (Ha & Schmidhuber, 2018; Hafner et al., 2019; Łukasz Kaiser et al., 2020). Remarkably, the Deep Planning Network (PlaNet) is a WM (Hafner et al., 2019) first demonstrated that agents can learn latent dynamics entirely from pixels and plan via online optimisation. This establishes the recurrent statespace model (RSSM) as a foundational architecture for world modelling. The Dreamer family of models then (Hafner et al., 2020; 2021; 2025) progressively refined this approach, culminating in DreamerV3 (Hafner et al., 2025). DreamerV3 masters over 150 different tasks with a single set of hyperparameters. Seeking to leverage the representational power of transformers, subsequent work replaced or augmented the recurrent backbone. IRIS (Micheli et al., 2023) showed that an autoregressive transformer over discrete latent tokens can serve as a highly data-efficient world model.
Discussion / Conclusion. We have presented Looped World Models, the first application of looped transformer architectures to world modelling. Our approach addresses a central tension in current world models: generating faithful long-horizon simulations demands deep computation, yet deeper models incur prohibitive deployment costs and are susceptible to compounding rollout errors. By iteratively refining latent environment states through a parameter-shared transformer block with stabilised residual dynamics, LoopWM structurally mirrors the recurrence inherent in physical systems while maintaining a compact parameter footprint. Empirically, LoopWM achieve up to 100× parameter efficiency over conventional approaches without sacrificing prediction quality. Theoretically, we show that spectralnorm constraints on state transitions yield provably stable rollouts, providing formal guarantees that are absent in standard autoregressive world models. Furthermore, our adaptive computation mecha-