Assessing adaptive world models in machines with novel games

Paper · arXiv 2507.12821 · Published July 17, 2025
LLM Evaluations and BenchmarksWorld Models

Human intelligence exhibits a remarkable capacity for rapid adaptation and effective problem-solving in novel and unfamiliar contexts. We argue that this profound adaptability is fundamentally linked to the efficient construction and refinement of internal representations of the environment, commonly referred to as world models, and we refer to this adaptation mechanism as world model induction. However, current understanding and evaluation of world models in artificial intelligence (AI) remains narrow, often focusing on static representations learned from training on a massive corpora of data, instead of the efficiency and efficacy of models in learning these representations through interaction and exploration within a novel environment. In this Perspective, we provide a view of world model induction drawing on decades of research in cognitive science on how humans learn and adapt so efficiently; we then call for a new evaluation framework for assessing adaptive world models in AI. Concretely, we propose a new benchmarking paradigm based on suites of carefully designed games with genuine, deep and continually refreshing novelty in the underlying game structures — we refer to this kind of games as novel games.

Introduction. A hallmark of human intelligence is the capacity for rapid adaptation, solving new problems quickly under novel and unfamiliar conditions. Over evolutionary timescales, this adaptive intelligence has enabled humans to survive and flourish in a vast landscape of complex and ever-changing environments. In modern life, people are continually adapting to new social situations such as new laws, cultural environments, partners and foes—often (if not always) with remarkable effectiveness and efficiency. Decades of research in cognitive science suggests that a key mechanism supporting this rapid adaptation is the construction and refinement of mental models and intuitive theories to explain the world (Johnson-Laird, 1983; Gopnik and Wellman, 2012; Gelman and Legare, 2011; Tenenbaum et al., 2011; Gerstenberg and Tenenbaum, 2017; Ullman and Tenenbaum, 2020).

Discussion / Conclusion. In this Perspective, we have argued that a critical component for developing truly general and robust artificial intelligence lies in its capacity for adaptation to novel circumstances. This adaptive capability is fundamentally linked to the agent’s ability to rapidly induce and dynamically refine internal world models when confronted with unknown environments. We then introduce an evaluation paradigm centered around carefully constructed novel games. This framework is specifically designed to evaluate AI systems on their capacity for adaptive world modeling, which is essential for efficient learning and robust generalization in dynamic, unforeseen environments where underlying rules and structures are often hidden from the agent. While we believe this paradigm offers a valuable path forward, we acknowledge several important outstanding questions and challenges that warrant future investigation and refinement. A fundamental question that may arise regarding our central thesis is the extent to which hierarchical and adaptive world models are truly necessary for rapid and efficient adaptation.