SYNTHESIS NOTE

Topics›RAG›this note

Can a model's partial response guide what to retrieve next?

Does using the model's in-progress output as a retrieval signal reveal information needs better than the original query alone? This explores whether generation itself can diagnose what documents are missing.

Synthesis note · 2026-02-22 · sourced from RAG

Standard RAG asks: "what documents are relevant to this query?" before any generation has occurred. The query is the only signal available. For complex tasks, the query is often an inadequate signal — it expresses what was asked but not what is needed to answer it fully.

ITER-RETGEN (Iterative Retrieval-Generation Synergy) demonstrates an alternative: use the model's current response to the task as the retrieval query. The model's response "shows what might be needed to finish the task" — it contains implicit signals about the gaps between what has been answered and what remains unaddressed.

The synergy is iterative: generate a response → use response as retrieval query → retrieve more relevant documents → regenerate with new context → repeat. Each generation round surfaces new implicit information needs that the original query did not express. Performance on multi-hop question answering, fact verification, and commonsense reasoning improves substantially over single-pass RAG.

This reframes what generation is for in RAG pipelines. Generation is not only the terminal output step — it is also a diagnostic step that identifies what retrieval should target next. The generator functions as both an answer producer and an information-need clarifier.

The connection to human information seeking: humans working on complex research do not submit all their queries upfront. They read, understand what they know and don't know, then query for the specific gaps that reading revealed. ITER-RETGEN operationalizes this workflow.

Inquiring lines that read this note 34

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

Can AI-generated outputs constitute genuine knowledge or valid claims?

How do archive systems handle knowledge that changes with each generation?

Why do reasoning models fail at systematic problem-solving and search?

How can models identify insufficient information and respond appropriately without guessing?

How should retrieval systems optimize for multi-step reasoning during inference?

When should retrieval-augmented systems decide to fetch new information?

What are the consequences of models training on synthetic data?

How can smaller models help select useful data for larger models?

How should dialogue systems best leverage conversation history for retrieval?

Should production CRS systems combine multiple retrieval strategies in a hybrid approach?

How can identical external performance mask different internal representations?

What happens when prompt-optimized results lack anchoring in real data?

How do prompt structure and constraints affect model instruction reliability?

What makes draft-centric systems better anchors for coherence than feed-forward outputs?

How should iterative research systems allocate reasoning per search step?

What structural advantages do diffusion language models offer over autoregressive methods?

Do bidirectional and any-order generation expose different parts of the joint distribution?

What dimensions of recommendation quality do standard metrics miss?

What consumption data would validate the limited-consumption model in production systems?

Can model routing outperform monolithic scaling as an efficiency strategy?

How does routing decide between models before generation happens?

What makes weaker teacher models effective for stronger student training?

Can we cheaply estimate which samples are currently most informative?

What determines success in training models on multiple tasks?

When and what should a model actually decide to delegate?

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

15 direct connections · 115 in 2-hop network ·medium cluster Open in graph ↗

Can a model's partial response guide what to ret… When should retrieval happen during model generati… What makes deep research fundamentally different f… Do iterative refinement methods suffer from overth… Does revising your own reasoning actually help or …

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

When should retrieval happen during model generation? Explores whether retrieval should occur continuously, at fixed intervals, or only when the model signals uncertainty. Standard RAG retrieves once; long-form generation requires dynamic triggering based on confidence signals.
complementary trigger mechanism: FLARE uses confidence (token probability) as the signal; ITER-RETGEN uses response content (what was generated so far) as the signal
What makes deep research fundamentally different from RAG? Explores whether current systems using the label 'deep research' actually meet a rigorous three-component definition involving multi-step gathering, cross-source synthesis, and iterative refinement, or if they're performing something narrower.
ITER-RETGEN is an implementation of iterative query refinement as the third component of deep research
Do iterative refinement methods suffer from overthinking? Iterative refinement approaches like Self-Refine structurally resemble token-level overthinking in o1-like models. Does revision across multiple inference calls reproduce the same accuracy degradation seen within single inferences?
ITER-RETGEN is iterative refinement with an external information escape: each iteration retrieves new evidence rather than re-processing the same context, avoiding the variance inflation that pure self-revision causes; the contrast shows that iterative refinement fails when information is held constant but may succeed when each iteration adds genuinely new knowledge
Does revising your own reasoning actually help or hurt? Self-revision in reasoning models often degrades accuracy, while external critique improves it. Understanding what makes revision helpful or harmful could reshape how we design systems that need to correct themselves.
ITER-RETGEN is a concrete implementation of external-signal-driven revision: the retrieval signal is external (new documents), not internal (model's assessment of its own output); this positions ITER-RETGEN on the side of revisions that help rather than hurt

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

model response quality is a retrieval signal — the partial answer reveals what information is still needed

Can a model's partial response guide what to retrieve next?

Inquiring lines that read this note 34

Related concepts in this collection 4

Related papers in this collection 8

Search by related questions 4