SYNTHESIS NOTE
Agentic Systems and Tool Use Training, RL, and Test-Time Scaling Model Architecture and Internals

Can externalizing bookkeeping improve search agent performance?

Does moving routine state management out of the policy and into a stateful environment harness free reinforcement learning to focus on genuine semantic decisions? This explores whether division of labor between environment and model improves search efficiency.

Synthesis note · 2026-06-03 · sourced from Reasoning o1 o3 Search

The usual framing of a search agent is a policy over a growing transcript: the model must simultaneously decide what to search and remember what it has seen, which evidence is useful, which constraints remain open, and which claims it actually checked. Harness-1 argues this overloads reinforcement learning — it forces the policy to optimize both genuine semantic search decisions and routine bookkeeping that the environment can maintain far more reliably.

The fix is a division of labor. The harness maintains environment-side working memory: a candidate pool, an importance-tagged curated set, compact evidence links, verification records, deduplicated observations, and budget-aware context rendering. The policy keeps only the semantic decisions — what to query, what to keep or discard, what to verify, and when to stop. A 20B model trained this way reaches 0.730 average curated recall across eight benchmarks, beating the next open searcher by +11.4 points and staying competitive with much larger frontier models.

The deeper claim is that the harness is not an implementation detail but part of what the policy learns to use — gains transfer to held-out benchmarks and survive component ablation. This is the search-agent instantiation of a broader principle: capability moves out of parameters and into the editable scaffolding. Since Is long-context bottleneck really about memory or compute?, externalizing bookkeeping is exactly what frees the policy's scarce reasoning compute for decisions only it can make.

Inquiring lines that use this note as a source 9

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
13 direct connections · 115 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

search agents should externalize recoverable bookkeeping to a stateful harness so RL only optimizes semantic decisions