Qwen-AgentWorld: Language World Models for General Agents

Paper · arXiv 2606.24597 · Published June 23, 2026
LLM Architecture

A world model predicts environment dynamics based on current observations and actions, serving as a core cognitive mechanism for reasoning and planning. In this work, we investigate how world modeling based on language models can further push the boundaries of general agents. (i) We first focus on building foundation models for agentic environment simulation. We introduce Qwen-AgentWorld-35B-A3B and Qwen- AgentWorld-397B-A17B, the first language world models capable of simulating agentic environments covering 7 domains via long chain-of-thought reasoning. Leveraging more than 10M environment interaction trajectories of 7 domains in real-world environments, we develop Qwen-AgentWorld through a three-stage training pipeline: CPT injects general-purpose world modeling capabilities from the state transition dynamics and augmented professional corpora, SFT activates next-state-prediction reasoning, and RL sharpens simulation fidelity through a tailored framework with hybrid rubric-andrule rewards.

Introduction. World models have been widely recognized as a foundation toward general intelligence (Ball et al., 2025; World Labs team, 2025; Xiang et al., 2025a; Ali et al., 2025), with a growing consensus that learning to predict the world is prerequisite to acting effectively within it (LeCun et al., 2022; Hafner et al., 2023; 2025; Assran et al., 2025). Richens et al. (2025) further prove a stronger claim: any agent capable of generalizing across a sufficiently broad range of tasks must have learned a world model, establishing world models not merely as useful but as necessary for general-purpose agents. Yet the language environments in which LLM agents operate still lack a general-purpose world model. In the agent–environment interaction loop, two complementary components are essential: the policy (states →actions) and the world model ((states, actions) →subsequent states). However, current research on LLM agents has focused almost exclusively on the policy side. We argue that world modeling is a crucial missing piece in the path to general agents.

Discussion / Conclusion. We presented Qwen-AgentWorld, the first family of native language world models covering seven agent interaction domains within a single model at two scales (35B-A3B and 397B-A17B). A three-stage recipe “CPT injects, SFT activates, RL sharpens” progressively injects environment knowledge, activates next-state-prediction reasoning, and sharpens simulation fidelity. We also introduced AgentWorldBench, a LWM benchmark that pairs every sample with a ground-truth observation from real environments. As a decoupled simulator, we validate the effectiveness of controllable simulation on 3 agentic benchmarks, surpassing both uncontrolled simulation and real-environment training. As a unified agent foundation model, LWM warm-up consistently improves downstream agent performance across 7 diverse tasks via cross-domain transfer, providing initial validation that LWMs can serve as a foundation for building stronger agent models. By enabling controllable simulation beyond real environments and establishing next-state prediction as a transferable agent foundation, language world modeling opens a new axis for scaling general agents beyond what real-environment interaction alone can provide.