SearchSwarm: Towards Delegation Intelligence in Agentic LLMs for Long-Horizon Deep Research

Paper · arXiv 2606.09730 · Published June 8, 2026

Large language models are increasingly expected to handle complex, long-horizon real-world tasks whose context demands can grow without bound, yet model context windows remain inherently finite. Recent work explores a paradigm where a main agent decomposes tasks and dispatches subtasks to subagents, which execute and return only summarized results, conserving the main agent’s context budget. However, performing this well requires delegation intelligence: the ability to decompose complex tasks, determine when and what to delegate, and integrate returned results into the ongoing workflow. Training data for this capability is scarce in naturally occurring text, and to our knowledge, how to synthesize such data and train models to acquire this capability remains largely unexplored in the open-source community. To bridge this gap, we present a preliminary exploration targeting deep research, a representative long-horizon agent task. Specifically, we design a harness that guides the model toward high-quality task decomposition and delegation, while constraining subagents to return results properly to support the main agent’s workflow.

Introduction. Large language models are increasingly deployed as agents for complex, long-horizon real-world tasks whose context demands can grow without bound (Jimenez et al., 2024; Zhang et al., 2026; Yang et al., 2026), yet model context windows remain inherently finite. This fundamental tension necessitates context management strategies that selectively retain or condense information to fit within limited capacity. Early approaches include summarizing interaction history after exceeding a length threshold, or retaining only a portion of tool outputs, among others (Liu et al., 2025; Zeng et al., 2026; MiroMind Team, 2026). However, these methods are fundamentally passive: they lack prior planning, waiting until a context budget is exhausted before compressing, or indiscriminately discarding past observations by fixed rules. In contrast, a paradigm where the main agent decomposes tasks and delegates subtasks to subagents represents a more active and intelligent form of context management (Anthropic, 2025a).

Discussion / Conclusion. We present SearchSwarm, a preliminary exploration of training delegation intelligence for long-horizon agent tasks, demonstrated effective on deep research. We design a harness that guides the main agent toward effective task decomposition, comprehensive subagent briefing, and citation-grounded result integration, and demonstrate that this harness improves deep research performance at inference time. Using the harness, we synthesize supervised fine-tuning data that internalizes delegation behavior into model weights. The resulting model, SearchSwarm-30B-A3B, achieves state-of-the-art performance among models of comparable scale on BrowseComp, BrowseComp-ZH, GAIA, and xbench-DeepSearch, while remaining competitive with models over 10× larger. Analysis shows that the delegation intelligence acquired through training generalizes to single-agent settings and open-ended research tasks, suggesting that the structured investigative patterns encoded in our training data confer benefits beyond the specific delegation paradigm.

SearchSwarm: Towards Delegation Intelligence in Agentic LLMs for Long-Horizon Deep Research

Synthesis notes that discuss concepts related to this paper