Does including all conversation history actually help retrieval?

Conversational search systems typically use all previous context to understand current queries. But do topic switches in multi-turn conversations inject noise that degrades performance rather than helps it?

Synthesis note · 2026-02-22 · sourced from Conversation Architecture Structure

A common assumption in conversational search and QA is that including all previous conversation context helps the model understand the current query. Two independent research programs demonstrate this assumption is wrong.

The problem: topic switches within a conversation session are common. A user might discuss restaurants, then switch to hotels, then return to restaurants. Using ALL previous queries to expand the current query "will inevitably inject irrelevant information into the expanded query and result in sub-optimal queries."

Two complementary solutions:

Learning to Relate proposes selecting useful previous queries based on whether they improve retrieval effectiveness for the current query. A multi-task learning method jointly optimizes query selection and dense retrieval — and the automated selection outperforms human annotations because the model optimizes for retrieval quality while humans optimize for semantic understanding.

DHS-ConvQA uses entity-based similarity between history turns and the current question, then applies attention-based re-ranking to weight useful terms. A binary classification task highlights useful terms (predicted as 1) and ignores irrelevant ones (predicted as 0).

The key finding generalizes: for both conversational search and conversational QA, selective context is better than full context. This challenges the assumption that more context is always better — an assumption shared by RAG systems and long-context models.

Since Why do language models fail in gradually revealed conversations?, the selective history mechanism addresses a specific form of getting lost: when previous turns about a different topic bias the model's interpretation of the current turn. The fix is not better reasoning over more context but better selection of which context to include. This is the retrieval-side complement to Why do language models engage with conversational distractors?, which addresses the same problem at generation time — models lack the ability to recognize and resist topical diversion, whether it comes from their own context window (selective history) or from user behavior (topic-following).

Two additional failure modes from conversational memory research (2406.00057): Beyond topic switches, conversational retrieval faces two challenges absent from static database retrieval: (1) time/event-based queries — users ask "what did we discuss yesterday?" or "summarize Jason's points from January 6th" which require retrieval by temporal metadata, not semantic similarity; (2) ambiguous queries — pronouns and demonstratives ("tell me more about that") that require surrounding conversational context to disambiguate before retrieval can occur. Standard vector-DB RAG fails both. The combined solution requires chaining table-based search (for metadata), vector-database retrieval (for content), and disambiguation prompting (for resolving ambiguous references). See Why do time-based queries fail in conversational retrieval systems?.

Inquiring lines that read this note 17

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

How should dialogue recommender systems manage conversation history and state?

How can language models sustain linguistic synchrony and intersubjectivity during dialogue?

Why do conversational queries drift away from what triggered them?

How should dialogue systems best leverage conversation history for retrieval?

How should conversational agents balance goal-driven initiative with user control?

What interaction history signals indicate what a participant finds relevant?

How do formal dialogue structures reveal conversation coherence mechanisms?

What makes two conversation turns the same thread rather than different threads?

What role does compression play in language model capability and generalization?

Can compressive memory track what matters most across 35 conversation sessions?

What structural advantages do diffusion language models offer over autoregressive methods?

Can selective history filtering address topic drift that generation-time topic following cannot prevent?

Why do multi-turn conversations degrade AI intent and coherence?

What causes multi-turn dialogue quality to degrade over time?

Related concepts in this collection 6

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

17 direct connections · 137 in 2-hop network ·medium cluster Open in graph ↗

Does including all conversation history actually… Why do language models fail in gradually revealed … Can long-context models resolve retriever-reader i… When should retrieval happen during model generati… Why do language models engage with conversational … Why do dialogue systems lose context when topics r… Why do users drift away from their original inform…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Why do language models fail in gradually revealed conversations? Explores why LLMs perform 39% worse when instructions arrive incrementally rather than upfront, and whether they can recover from early mistakes in multi-turn dialogue.
selective history prevents one specific mechanism of getting lost (irrelevant context injection)
Can long-context models resolve retriever-reader imbalance? Traditional RAG systems force retrievers to find precise passages because readers had small context windows. Do modern long-context LLMs change what architecture makes sense?
selective history is a retriever-side approach; the reader-side approach may complement
When should retrieval happen during model generation? Explores whether retrieval should occur continuously, at fixed intervals, or only when the model signals uncertainty. Standard RAG retrieves once; long-form generation requires dynamic triggering based on confidence signals.
both argue for adaptive rather than fixed retrieval strategies
Why do language models engage with conversational distractors? Explores why state-of-the-art LLMs struggle to maintain topical focus when users introduce off-topic turns, despite having explicit scope instructions. This gap suggests models lack training signals for ignoring irrelevant directions.
both identify topic boundary management as a critical missing capability: selective history addresses it at retrieval time (filtering irrelevant previous turns), topic-following addresses it at generation time (resisting topical diversion)
Why do dialogue systems lose context when topics return? Stack-based dialogue management removes topics after they're resolved, making it hard for systems to reference them later. Does this structural rigidity explain why conversational AI struggles with topic revisitation?
selective history is the retrieval-side implementation of flexible topic management: rather than rigid stack structures that lose context when topics are popped, selective retrieval dynamically identifies which prior turns are relevant regardless of structural position, enabling topic revisitation without the contamination from intervening topic switches
Why do users drift away from their original information need? When users know their knowledge is incomplete but cannot articulate what's missing, do they unintentionally shift topics? And can real-time systems detect this drift?
ASK-driven drift generates the topic switches that selective history must filter: users in anomalous knowledge states drift unintentionally, creating the irrelevant context injection that entity-based selection mechanisms must detect and exclude

Does including all conversation history actually help retrieval?

Inquiring lines that read this note 17

Related concepts in this collection 6

Related papers in this collection 8

Search by related questions 4