GrepSeek: Training Search Agents for Direct Corpus Interaction
Large Language Model (LLM) search agents have shown strong promise for knowledge-intensive language tasks through multiple rounds of reasoning and information retrieval. Most existing systems access information using a retriever that takes a keyword or natural language query and returns a ranked list of documents using an index of pre-computed document representations. In this work, we explore a complementary perspective in which the search agent treats the corpus itself as the search environment and finds evidence by issuing executable shell commands. We introduce GrepSeek, an optimized direct corpus interaction (DCI) search agent that trains a compact search agent to find, filter, and compose evidence from large text corpora. To address the instability of learning behavior directly with reinforcement learning on large corpora, we propose a two-stage training pipeline. First, we construct a cold-start dataset using an answer-aware Tutor and answer-blind Planner to generate verified, causally grounded search trajectories. Second, we refine the initialized policy with Group Relative Policy Optimization (GRPO), allowing the agent to improve its task-oriented search behavior through direct interaction with the corpus.
Introduction. Large Language Model (LLM) search agents (or search agents for short) (Li et al., 2025; Jin et al., 2025) have shown strong promise in addressing complex information needs that may require reasoning, query decomposition, and/or information synthesis from multiple sources. These agents benefit from multiple interactions with a retrieval model to obtain required information for performing their knowledge-intensive tasks. In case of unstructured or semi-structured text corpora, these interactions are in the form of keyword or natural language queries. These approaches rely on decades of research in developing retrieval models, from lexical matching (Salton & Buckley, 1988; Robertson et al., 1994; Ponte & Croft, 1998) to semantic matching based on dense representation (Deerwester et al., 1990; Karpukhin et al., 2020) or sparse representations (Zamani et al., 2018; Formal et al., 2021). These models operate on pre-computed representations of documents to construct an index for the corpus.
Discussion / Conclusion. We introduced GrepSeek, a Direct Corpus Interaction (DCI) search agent that bypasses traditional pre-computed search indexes by operating directly over raw text corpora using standard Unix shell commands. Through a two-stage training pipeline—consisting of synthetically generated coldstart SFT followed by RL with GRPO—we demonstrated that search agents can learn to execute highly effective, interpretable, and lexically precise retrieval programs. GrepSeek achieves strong performance on challenging multi-hop reasoning benchmarks by precisely isolating symbolic patterns and enforcing strict entity-level constraints, succeeding in scenarios where dense embedding-based models often fail due to semantic conflation. In addition, our optimized sharded-parallel execution engine substantially reduces runtime memory requirements and eliminates the expensive offline indexing stage required by dense retrieval systems.