Learning Agent-Compatible Context Management for Long-Horizon Tasks

Paper · arXiv 2605.30785 · Published May 29, 2026

LLM agents increasingly face long-horizon tasks such as web search and deep research in real-world applications, where accumulated context can cause long-context degradation and reasoning failures. Prior work mitigates this through context management with agent-side context control or fixed strategies such as summarization, which require training the agent itself for adaptation — making it impractical for closed-source agents and ignoring that different agents may require different strategies. We introduce Adaptive Context Management (Ada- CoM), which trains an external LLM to manage the context of a frozen agent through flexible modification actions and end-to-end reinforcement learning. Across diverse agents on web search and deep research benchmarks, Ada- CoM substantially improves performance by preserving task constraints and progress while pruning stale content. The learned strategies reveal a Fidelity–Reliability Trade-off: agents with higher vanilla ReAct performance benefit from higher-fidelity context preservation, whereas lower-performing agents require more aggressive compression to stay within a reliable reasoning regime.

Introduction. With advances in semantic understanding, tool use, and interactive decision making, general-purpose LLM agent applications such as OpenClaw and Hermes Agent have emerged (Steinberger and contributors, 2025; Nous Research, 2025). Such applications often involve long-horizon reasoning, where tasks such as answering multi-constraint search queries (Wei et al., 2025; Li et al., 2025) or producing deep research reports (Du et al., 2025; Wang et al., 2025) require many interdependent steps over a growing context. A central bottleneck for LLMs in such long-horizon tasks is longcontext degradation. As tool results and intermediate reasoning accumulate, stale or irrelevant content can obscure salient evidence, amplify positional bias, and make subsequent decisions less reliable (Xiao et al., 2024; Liu et al., 2024; Shi et al., 2023). Prior work addresses this issue through context management, but typically places the burden of managing context on the agent itself.

Discussion / Conclusion. We introduced AdaCoM, an adaptive context management framework that trains an external manager to manage an agent’s running context while keeping the underlying agent frozen. With a flexible modification action space and RL training, Ada- CoM learns agent-compatible context management strategies. Experiments show that AdaCoM consistently improves diverse agents and can transfer to unseen agents with similar baseline capability. Our analysis further reveals a Fidelity–Reliability Trade-off: effective context management should preserve as much task-relevant information as possible while keeping the agent within a context regime where its reasoning remains reliable.

Learning Agent-Compatible Context Management for Long-Horizon Tasks

Synthesis notes that discuss concepts related to this paper