Can a model's partial response guide what to retrieve next?
Does using the model's in-progress output as a retrieval signal reveal information needs better than the original query alone? This explores whether generation itself can diagnose what documents are missing.
Standard RAG asks: "what documents are relevant to this query?" before any generation has occurred. The query is the only signal available. For complex tasks, the query is often an inadequate signal — it expresses what was asked but not what is needed to answer it fully.
ITER-RETGEN (Iterative Retrieval-Generation Synergy) demonstrates an alternative: use the model's current response to the task as the retrieval query. The model's response "shows what might be needed to finish the task" — it contains implicit signals about the gaps between what has been answered and what remains unaddressed.
The synergy is iterative: generate a response → use response as retrieval query → retrieve more relevant documents → regenerate with new context → repeat. Each generation round surfaces new implicit information needs that the original query did not express. Performance on multi-hop question answering, fact verification, and commonsense reasoning improves substantially over single-pass RAG.
This reframes what generation is for in RAG pipelines. Generation is not only the terminal output step — it is also a diagnostic step that identifies what retrieval should target next. The generator functions as both an answer producer and an information-need clarifier.
The connection to human information seeking: humans working on complex research do not submit all their queries upfront. They read, understand what they know and don't know, then query for the specific gaps that reading revealed. ITER-RETGEN operationalizes this workflow.
Inquiring lines that use this note as a source 33
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- How do archive systems handle knowledge that changes with each generation?
- What is the mechanistic signature when models chain facts never presented together?
- Can models identify what information they are missing in underspecified problems?
- Why does long-form generation need different retrieval than factoid questions?
- Why does retrieval quality sometimes conflict with final answer quality?
- Can models identify information gaps without just guessing or refusing to answer?
- What causes the retrieval-augmented generation to fail in practice?
- How can smaller models help select useful data for larger models?
- Does filtering passages before generation improve large model answer quality?
- Should production CRS systems combine multiple retrieval strategies in a hybrid approach?
- Can models learn to identify what information is missing from questions?
- How does proactive critical thinking enable models to identify missing information?
- Does the parallel versus sequential trade-off appear in retrieval-augmented generation systems?
- What happens when prompt-optimized results lack anchoring in real data?
- Can factually wrong generated documents still improve retrieval accuracy?
- When should a system decide to retrieve versus reason alone?
- What makes draft-centric systems better anchors for coherence than feed-forward outputs?
- Can retrieval strategies drive both draft refinement and new research question generation?
- Can generator feedback backpropagate through the entire retrieval pipeline?
- Do bidirectional and any-order generation expose different parts of the joint distribution?
- Can models distinguish between ambiguous and incomplete information inputs?
- What consumption data would validate the limited-consumption model in production systems?
- How does proactive critical thinking detect when information is incomplete?
- How does routing decide between models before generation happens?
- Should retrieval be triggered by model uncertainty or fixed intervals?
- How does response content compare to model confidence as a retrieval trigger?
- How should retrieval systems decide when to fetch new information?
- What role does document reranking play alongside decisions about whether to retrieve?
- Can we cheaply estimate which samples are currently most informative?
- How should retrieval triggers use model uncertainty instead of fixed intervals?
- Can retrieval systems decide when to retrieve instead of always querying?
- Why does production retrieval augmented generation underperform in real deployments?
- What would instruction-following retrieval enable that query-only systems cannot?
Related concepts in this collection 4
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
When should retrieval happen during model generation?
Explores whether retrieval should occur continuously, at fixed intervals, or only when the model signals uncertainty. Standard RAG retrieves once; long-form generation requires dynamic triggering based on confidence signals.
complementary trigger mechanism: FLARE uses confidence (token probability) as the signal; ITER-RETGEN uses response content (what was generated so far) as the signal
-
What makes deep research fundamentally different from RAG?
Explores whether current systems using the label 'deep research' actually meet a rigorous three-component definition involving multi-step gathering, cross-source synthesis, and iterative refinement, or if they're performing something narrower.
ITER-RETGEN is an implementation of iterative query refinement as the third component of deep research
-
Do iterative refinement methods suffer from overthinking?
Iterative refinement approaches like Self-Refine structurally resemble token-level overthinking in o1-like models. Does revision across multiple inference calls reproduce the same accuracy degradation seen within single inferences?
ITER-RETGEN is iterative refinement with an external information escape: each iteration retrieves new evidence rather than re-processing the same context, avoiding the variance inflation that pure self-revision causes; the contrast shows that iterative refinement fails when information is held constant but may succeed when each iteration adds genuinely new knowledge
-
Does revising your own reasoning actually help or hurt?
Self-revision in reasoning models often degrades accuracy, while external critique improves it. Understanding what makes revision helpful or harmful could reshape how we design systems that need to correct themselves.
ITER-RETGEN is a concrete implementation of external-signal-driven revision: the retrieval signal is external (new documents), not internal (model's assessment of its own output); this positions ITER-RETGEN on the side of revisions that help rather than hurt
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation
- Generator-Retriever-Generator: A Novel Approach to Open-domain Question Answering
- DeepRAG: Thinking to Retrieval Step by Step for Large Language Models
- Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy
- RAG-R1 : Incentivize the Search and Reasoning Capabilities of LLMs through Multi-query Parallelism
- Large Language Models Know Your Contextual Search Intent: A Prompting Framework for Conversational Search
- CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning
- Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs
Original note title
model response quality is a retrieval signal — the partial answer reveals what information is still needed