Does including all conversation history actually help retrieval?
Conversational search systems typically use all previous context to understand current queries. But do topic switches in multi-turn conversations inject noise that degrades performance rather than helps it?
A common assumption in conversational search and QA is that including all previous conversation context helps the model understand the current query. Two independent research programs demonstrate this assumption is wrong.
The problem: topic switches within a conversation session are common. A user might discuss restaurants, then switch to hotels, then return to restaurants. Using ALL previous queries to expand the current query "will inevitably inject irrelevant information into the expanded query and result in sub-optimal queries."
Two complementary solutions:
Learning to Relate proposes selecting useful previous queries based on whether they improve retrieval effectiveness for the current query. A multi-task learning method jointly optimizes query selection and dense retrieval — and the automated selection outperforms human annotations because the model optimizes for retrieval quality while humans optimize for semantic understanding.
DHS-ConvQA uses entity-based similarity between history turns and the current question, then applies attention-based re-ranking to weight useful terms. A binary classification task highlights useful terms (predicted as 1) and ignores irrelevant ones (predicted as 0).
The key finding generalizes: for both conversational search and conversational QA, selective context is better than full context. This challenges the assumption that more context is always better — an assumption shared by RAG systems and long-context models.
Since Why do language models fail in gradually revealed conversations?, the selective history mechanism addresses a specific form of getting lost: when previous turns about a different topic bias the model's interpretation of the current turn. The fix is not better reasoning over more context but better selection of which context to include. This is the retrieval-side complement to Why do language models engage with conversational distractors?, which addresses the same problem at generation time — models lack the ability to recognize and resist topical diversion, whether it comes from their own context window (selective history) or from user behavior (topic-following).
Two additional failure modes from conversational memory research (2406.00057): Beyond topic switches, conversational retrieval faces two challenges absent from static database retrieval: (1) time/event-based queries — users ask "what did we discuss yesterday?" or "summarize Jason's points from January 6th" which require retrieval by temporal metadata, not semantic similarity; (2) ambiguous queries — pronouns and demonstratives ("tell me more about that") that require surrounding conversational context to disambiguate before retrieval can occur. Standard vector-DB RAG fails both. The combined solution requires chaining table-based search (for metadata), vector-database retrieval (for content), and disambiguation prompting (for resolving ambiguous references). See Why do time-based queries fail in conversational retrieval systems?.
Inquiring lines that use this note as a source 15
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Why do bag-of-mentions models discard conversation order in the first place?
- Why do conversational queries drift away from what triggered them?
- How do time gaps between conversations change what chatbots should remember?
- Does full conversation history improve or degrade multi-turn retrieval accuracy?
- How does selective history retrieval improve conversational search accuracy?
- Why does selective context retrieval outperform including all historical information?
- What is the relationship between topic following and topic revisitation in conversation?
- Can concept-based search bridge the vocabulary mismatch between conversation and item index?
- What interaction history signals indicate what a participant finds relevant?
- What makes pronouns and demonstratives problematic in conversational retrieval systems?
- Why does selective conversation history outperform including all prior context?
- What makes two conversation turns the same thread rather than different threads?
- Can compressive memory track what matters most across 35 conversation sessions?
- Can selective history filtering address topic drift that generation-time topic following cannot prevent?
- How does multi-turn dialogue improve user satisfaction in search interactions?
Related concepts in this collection 6
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Why do language models fail in gradually revealed conversations?
Explores why LLMs perform 39% worse when instructions arrive incrementally rather than upfront, and whether they can recover from early mistakes in multi-turn dialogue.
selective history prevents one specific mechanism of getting lost (irrelevant context injection)
-
Can long-context models resolve retriever-reader imbalance?
Traditional RAG systems force retrievers to find precise passages because readers had small context windows. Do modern long-context LLMs change what architecture makes sense?
selective history is a retriever-side approach; the reader-side approach may complement
-
When should retrieval happen during model generation?
Explores whether retrieval should occur continuously, at fixed intervals, or only when the model signals uncertainty. Standard RAG retrieves once; long-form generation requires dynamic triggering based on confidence signals.
both argue for adaptive rather than fixed retrieval strategies
-
Why do language models engage with conversational distractors?
Explores why state-of-the-art LLMs struggle to maintain topical focus when users introduce off-topic turns, despite having explicit scope instructions. This gap suggests models lack training signals for ignoring irrelevant directions.
both identify topic boundary management as a critical missing capability: selective history addresses it at retrieval time (filtering irrelevant previous turns), topic-following addresses it at generation time (resisting topical diversion)
-
Why do dialogue systems lose context when topics return?
Stack-based dialogue management removes topics after they're resolved, making it hard for systems to reference them later. Does this structural rigidity explain why conversational AI struggles with topic revisitation?
selective history is the retrieval-side implementation of flexible topic management: rather than rigid stack structures that lose context when topics are popped, selective retrieval dynamically identifies which prior turns are relevant regardless of structural position, enabling topic revisitation without the contamination from intervening topic switches
-
Why do users drift away from their original information need?
When users know their knowledge is incomplete but cannot articulate what's missing, do they unintentionally shift topics? And can real-time systems detect this drift?
ASK-driven drift generates the topic switches that selective history must filter: users in anomalous knowledge states drift unintentionally, creating the irrelevant context injection that entity-based selection mechanisms must detect and exclude
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Learning to Relate to Previous Turns in Conversational Search
- Learning to Select the Relevant History Turns in Conversational Question Answering
- Dialogue Transformers
- Think-in-Memory: Recalling and Post-thinking Enable LLMs with Long-Term Memory
- Making Sense of Memory in AI Agents
- Toward Conversational Agents with Context and Time Sensitive Long-term Memory
- Compress to Impress: Unleashing the Potential of Compressive Memory in Real-World Long-Term Conversations
- Learning Pluralistic User Preferences through Reinforcement Learning Fine-tuned Summaries
Original note title
selective history retrieval outperforms full-context inclusion in conversational search — topic switches within sessions inject irrelevant information that degrades retrieval