Can fine-tuning replace query augmentation for retrieval?
Query augmentation helps retrievers handle ambiguous queries but increases input cost. Does fine-tuning the retrieval model achieve comparable performance without this overhead?
CoT query augmentation for RAG works by generating additional context before retrieval — a chain-of-thought that expands an ambiguous query into richer text that retrieval models can match against. This helps. For pretrained retrievers encountering underspecified queries, the additional context closes the gap between what was asked and what is actually needed.
The catch: CoT augmentation increases input sequence length. Longer inputs to the LLM cost more, and retrieval quality is sensitive to where relevant information falls in the context window. The augmentation adds a cost in exchange for a performance gain.
Context Tuning for RAG demonstrates that fine-tuning the retrieval model removes this trade-off. A fine-tuned semantic search model trained on implicit queries achieves comparable retrieval performance without CoT augmentation. When fine-tuning is applied, adding CoT produces only marginal additional gain — the model has already learned to bridge the ambiguity gap from training.
The mechanism: pretrained retrievers struggle with ambiguous/implicit queries because they were trained on explicit query-document pairs. Fine-tuning on implicit queries with usage signals (frequency, history, geo-temporal correlation) teaches the model to resolve ambiguity from context rather than requiring it to be spelled out.
This is an instance of a recurring pattern across LLM research: inference-time workarounds (chain-of-thought, query augmentation) represent the gap between what a model can do and what the task requires. Fine-tuning can close that gap and retire the workaround. The workaround's cost is then avoidable.
The practical corollary: query augmentation strategies should be evaluated against fine-tuned retrieval baselines, not just pretrained baselines. The augmentation is solving a training distribution problem, not an inherent query complexity problem.
Inquiring lines that use this note as a source 16
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Do Doc2Query approaches suffer from the same misaligned-target problem?
- Why do longer queries benefit less from clarification questions?
- What causes the retrieval-augmented generation to fail in practice?
- What makes retrieval augmentation more effective than simply increasing embedding size?
- Why do pretrained retrievers struggle with ambiguous or implicit queries?
- How should query augmentation strategies be properly evaluated against baselines?
- What hidden costs might fine-tuning retrieval models introduce on out-of-distribution queries?
- Could eliminating retrieval entirely work better than shifting the burden?
- How does query planning as a separate step improve multi-hop retrieval coherence?
- How does semantic mismatch between user language and API documentation degrade tool retrieval?
- Can retrieval augmentation and Bayesian approaches both solve the sparsity problem?
- How does reflection-based query refinement differ from single-pass retrieval strategies?
- How do retrieval and fine-tuning trade off flexibility against training cost?
- What distinguishes iterative query refinement from pure self-revision loops?
- Does retrieval quality depend more on access structure or write gating?
- Why does production retrieval augmented generation underperform in real deployments?
Related concepts in this collection 2
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Can prompt optimization teach models knowledge they lack?
Explores whether sophisticated prompting techniques can inject new domain knowledge into language models, or if they're limited to activating existing training knowledge.
same principle in the inverse direction: augmentation cannot inject knowledge the retriever lacks, but fine-tuning can train the model to handle what augmentation was compensating for
-
Does supervised fine-tuning actually improve reasoning quality?
While SFT boosts final-answer accuracy, does it degrade the quality and informativeness of the reasoning steps that justify those answers? This matters for high-stakes domains requiring auditable decision-making.
caution on fine-tuning: domain SFT sometimes degrades rather than helps; retrieval fine-tuning may have analogous hidden costs on out-of-distribution queries
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Large Language Models For Social Networks: Applications, Challenges, And Solutions
- Query Rewriting for Retrieval-Augmented Large Language Models
- Context Tuning for Retrieval Augmented Generation
- R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
- On the Theoretical Limitations of Embedding-Based Retrieval
- RAG-R1 : Incentivize the Search and Reasoning Capabilities of LLMs through Multi-query Parallelism
- Aligning Language Models to Explicitly Handle Ambiguity
- Query Understanding in the Age of Large Language Models
Original note title
fine-tuning the retrieval model eliminates the need for query augmentation