Mastering Diverse Domains through World Models

Paper · arXiv 2301.04104 · Published January 10, 2023
Cognitive Models and Latent RepresentationsDomain Specialization in LLMs

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algorithms can be readily applied to tasks similar to what they have been developed for, configuring them for new application domains requires significant human expertise and experimentation. We present DreamerV3, a general algorithm that outperforms specialized methods across over 150 diverse tasks, with a single configuration. Dreamer learns a model of the environment and improves its behavior by imagining future scenarios. Robustness techniques based on normalization, balancing, and transformations enable stable learning across domains. Applied out of the box, Dreamer is the first algorithm to collect diamonds in Minecraft from scratch without human data or curricula. This achievement has been posed as a significant challenge in artificial intelligence that requires exploring farsighted strategies from pixels and sparse rewards in an open world. Our work allows solving challenging control problems without extensive experimentation, making reinforcement learning broadly applicable.

Introduction. Reinforcement learning has enabled computers to solve tasks through interaction, such as surpassing humans in the games of Go and Dota1,2. It is also a key component for improving large language models beyond what is demonstrated in their pretraining data3,4. While PPO5 has become a standard algorithm in the field of reinforcement learning, more specialized algorithms are often employed to achieve higher performance. These specialized algorithms target the unique challenges posed by different application domains, such as continuous control6, discrete actions7,8, sparse rewards9, image inputs10, spatial environments11, and board games12. However, applying reinforcement learning algorithms to sufficiently new tasks—such as moving from video games to robotics tasks— requires substantial effort, expertise, and computational resources for tweaking the hyperparameters of the algorithm13.

Discussion / Conclusion. We present the third generation of the Dreamer algorithm, a general reinforcement learning algorithm that masters a wide range of domains with fixed hyperparameters. Dreamer excels not only across over 150 tasks but also learns robustly across varying data and compute budgets, moving reinforcement learning toward a wide range of practical applications. Applied out of the box, Dreamer is the first algorithm to collect diamonds in Minecraft from scratch, achieving a significant milestone in the field of artificial intelligence. As a high-performing algorithm that is based on a learned world model, Dreamer paves the way for future research directions, including teaching agents world knowledge from internet videos and learning a single world model across domains to allow artificial agents to build up increasingly general knowledge and competency.