Does limiting reasoning per turn improve multi-turn search quality?
When language models engage in iterative search cycles, does capping reasoning at each turn—rather than just total compute—help preserve context for subsequent retrievals and improve overall search effectiveness?
The overthinking cluster established that extended reasoning within a single query degrades accuracy beyond a critical token threshold. ASearcher extends this to multi-turn search: each turn's reasoning must also be capped, but for a different reason. In multi-turn search, the problem is not just variance inflation within one response — it is that excessive reasoning in one turn consumes context that subsequent retrieval rounds need.
The mechanism: in an iterative search cycle (query → retrieve → reason → refine query → retrieve again), each reasoning step takes up context. If turn N uses its full reasoning budget, turn N+1 has less context available to incorporate new retrieved evidence. The search agent effectively degrades its own ability to update on new information by overthinking in early turns.
This is a distinct failure mode from single-turn overthinking. Single-turn overthinking produces high variance output from one extended reasoning chain. Multi-turn overthinking produces a degraded retrieval loop where later turns are operating with less fresh evidence than they need. The fix is different: not just total compute capping, but per-turn reasoning budgets that preserve context headroom for subsequent iterations.
Since Do iterative refinement methods suffer from overthinking?, this finding places multi-turn search squarely in the same family of problems. The timescale is the retrieval cycle rather than the self-revision step, but the mechanism — sequential iteration that amplifies rather than corrects — is identical. The practical implication: DR agent design must set per-turn reasoning limits, not just overall query time limits.
Inquiring lines that use this note as a source 59
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Can models learn when to invoke search during reasoning tasks?
- How should we allocate compute between reasoning and retrieval iterations?
- Does parallel retrieval outperform sequential search chains at test time?
- How can per-step decisions about knowledge retrieval improve reasoning over uniform policies?
- What makes proactive tool retrieval better than single-round semantic matching?
- Does full conversation history improve or degrade multi-turn retrieval accuracy?
- Can multi-turn rewards fix models that lose track midway?
- How does hierarchical query planning versus flat prompting affect multi-source retrieval?
- How does search budget affect answer quality at test time?
- How do search tasks differ from derivation tasks in reasoning efficiency?
- Why does multi-turn RL generate orders of magnitude more tokens than single-turn?
- What makes session-aware multi-turn tracking necessary for asynchronous training?
- Can long-context readers handle compositional tasks or just semantic search?
- How should iterative research tasks limit context per reasoning turn?
- Does filtering passages before generation improve large model answer quality?
- How does query planning as a separate step improve multi-hop retrieval coherence?
- Can multi-turn reinforcement learning improve tool use in language models?
- What are retrieval heads and why do they matter for reasoning?
- How does per-token adaptive compute improve efficiency in recurrent reasoning?
- Why does extended reasoning fail for search and knowledge retrieval tasks?
- How much does inference budget improve self-generated search performance?
- Could real-time search systems avoid era sensitivity in legal reasoning?
- How does overthinking in early turns degrade later retrieval rounds?
- Can parallel retrieval chains avoid the context consumption problem?
- What role does search capacity play in making debate more accurate?
- What specific metrics distinguish single-turn versus multi-turn collaboration success?
- Does the parallel versus sequential trade-off appear in retrieval-augmented generation systems?
- Why does single-round retrieval fail on multi-step tasks across different domains?
- What limits exist on retrieval budget during inference?
- Why do reasoning models wander instead of searching systematically?
- Why do long-horizon reasoning tasks need per-turn step limits rather than just compute budgets?
- What distinguishes systematic search from wandering exploration in reasoning?
- Can reasoning in free text then formatting separately recover performance?
- Can multi-turn aware rewards improve alignment beyond single-turn helpfulness?
- Does unrestricted reasoning per search step degrade iterative quality over time?
- What is the optimal balance between search rounds and reasoning depth per round?
- What computational cost does trajectory-bursty inference impose on per-query context requirements?
- How does reflection-based query refinement differ from single-pass retrieval strategies?
- Why do per-turn thinking budgets matter alongside iterative retrieval depth?
- Do expansion-reflection loops and chain-of-retrieval approaches solve the same problem?
- How do complete multi-turn trajectories differ from isolated task examples?
- How do turn-level retrieval failures differ from dialogue-level accumulation failures?
- What update rules should govern dialogue-scoped versus turn-scoped memory?
- How do parallel and sequential retrieval strategies compare in compute efficiency?
- What happens to iterative search quality when reasoning depth is unconstrained?
- Why does iterative refinement fail when information stays constant?
- What distinguishes iterative query refinement from pure self-revision loops?
- Why do single-turn RL methods fail to generalize to multi-turn tasks?
- What quality filters distinguish useful reasoning enrichment from shallow repetition?
- How much does retrieval budget improve when triggered by dual signals instead of fixed intervals?
- Can adaptive per-step decisions outperform uniform retrieval policies across different reasoning tasks?
- How does multi-turn dialogue improve user satisfaction in search interactions?
- How should retrieval systems handle multi-hop reasoning and iterative information needs?
- How do past research mistakes prevent future pivot loops from repeating them?
- How do sleep-time and post-completion methods reduce inference latency?
- What role do cyclic fixed points play in stable reasoning?
- How does the inference steps dial compare to test-time compute trade-offs in language models?
- What makes multi-turn critique trajectories more effective than single-turn reasoning chains?
- How does structured environment state compare to transcript replay for multi-turn reasoning?
Related concepts in this collection 6
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Does more thinking time always improve reasoning accuracy?
Explores whether extending a model's thinking tokens linearly improves performance, or if there's a point beyond which additional reasoning becomes counterproductive.
extends: the overthinking threshold applies within each search turn, not just in single-turn reasoning
-
Do iterative refinement methods suffer from overthinking?
Iterative refinement approaches like Self-Refine structurally resemble token-level overthinking in o1-like models. Does revision across multiple inference calls reproduce the same accuracy degradation seen within single inferences?
grounds: ASearcher is the retrieval-domain instance of this synthesis insight; multi-turn search is the operational context
-
Does extended thinking actually improve reasoning or just increase variance?
When models think longer, do they reason better, or do they simply sample from a wider distribution of outputs that happens to cover correct answers more often? This matters because it determines whether test-time compute is genuinely scaling reasoning capability.
extends: per-turn variance inflation compounds across retrieval iterations, not just within one response
-
Why does vanilla RAG produce shallow and redundant results?
Standard RAG systems get stuck in a single semantic neighborhood because their initial query determines what documents are discoverable. The question asks whether fixed retrieval strategies fundamentally limit knowledge depth compared to iterative exploration.
design constraint complement: OmniThink solves retrieval scope via reflection-expansion; this note solves per-turn depth via reasoning budgets; complete iterative retrieval design requires both
-
Can retrieval be extended into multi-step chains like reasoning?
Standard RAG retrieves once, but multi-hop tasks need intermediate steps. Can we train models to plan retrieval sequences the way chain-of-thought trains reasoning, and scale retrieval at test time?
CoRAG's tree search offers a structural alternative: instead of sequential deepening that consumes context across turns, branch retrieval chains in parallel and aggregate; best-of-N sampling over retrieval chains avoids the per-turn context pressure
-
Can reinforcement learning scale beyond single-turn language tasks?
Most RL for LLMs targets simple single-turn problems. This research asks whether RL can handle multi-turn interactive environments with sparse rewards and rich environmental feedback, like real software engineering tasks.
validates: SWE-RL shows RL can learn per-turn discipline through training rather than inference-time limiting; the SWE domain's rich intermediate feedback (compiler traces, test logs) enables RL to discover the same per-turn budgeting that ASearcher imposes architecturally
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- RAG-R1 : Incentivize the Search and Reasoning Capabilities of LLMs through Multi-query Parallelism
- SSRL: Self-Search Reinforcement Learning
- ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning
- Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses
- The Model Says Walk: How Surface Heuristics Override Implicit Constraints in LLM Reasoning
- Test-time Prompt Intervention
- Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models
- Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL
Original note title
long-horizon research tasks require limiting reasoning steps per turn not just total compute because unrestricted thinking degrades iterative search quality