SYNTHESIS NOTE
Reasoning, Retrieval, and Evaluation Training, RL, and Test-Time Scaling

Does limiting reasoning per turn improve multi-turn search quality?

When language models engage in iterative search cycles, does capping reasoning at each turn—rather than just total compute—help preserve context for subsequent retrievals and improve overall search effectiveness?

Synthesis note · 2026-02-21 · sourced from Deep Research

The overthinking cluster established that extended reasoning within a single query degrades accuracy beyond a critical token threshold. ASearcher extends this to multi-turn search: each turn's reasoning must also be capped, but for a different reason. In multi-turn search, the problem is not just variance inflation within one response — it is that excessive reasoning in one turn consumes context that subsequent retrieval rounds need.

The mechanism: in an iterative search cycle (query → retrieve → reason → refine query → retrieve again), each reasoning step takes up context. If turn N uses its full reasoning budget, turn N+1 has less context available to incorporate new retrieved evidence. The search agent effectively degrades its own ability to update on new information by overthinking in early turns.

This is a distinct failure mode from single-turn overthinking. Single-turn overthinking produces high variance output from one extended reasoning chain. Multi-turn overthinking produces a degraded retrieval loop where later turns are operating with less fresh evidence than they need. The fix is different: not just total compute capping, but per-turn reasoning budgets that preserve context headroom for subsequent iterations.

Since Do iterative refinement methods suffer from overthinking?, this finding places multi-turn search squarely in the same family of problems. The timescale is the retrieval cycle rather than the self-revision step, but the mechanism — sequential iteration that amplifies rather than corrects — is identical. The practical implication: DR agent design must set per-turn reasoning limits, not just overall query time limits.

Inquiring lines that use this note as a source 59

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 6

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
19 direct connections · 165 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

long-horizon research tasks require limiting reasoning steps per turn not just total compute because unrestricted thinking degrades iterative search quality