Why do dialogue systems lose context when topics return?

Stack-based dialogue management removes topics after they're resolved, making it hard for systems to reference them later. Does this structural rigidity explain why conversational AI struggles with topic revisitation?

Synthesis note · 2026-02-22 · sourced from Conversation Architecture Structure

Grosz and Sidner (1986) proposed representing dialogue history as a stack of topics — discourse segments that may not directly follow one another in conversation. The idea was sound: conversations contain interleaved sub-dialogues that need tracking. RavenClaw implemented this as a dialogue stack for handling sub-dialogues.

But the strict structure of a stack is limiting. When a topic is popped from the stack, it is no longer available to provide context. Consider:

BOT: Your total is $15.50 — shall I charge the card you used last time? USER: Do I still have credit from that refund? BOT: Yes, your account is $10 in credit. USER: Ok, great. BOT: Shall I place the order? USER: Yes. BOT: Done. USER: So that used up my credit, right?

The last question refers to the refund credits topic. If that topic was popped from the stack, the system cannot use it to interpret what the user is asking about. Since humans freely revisit and interleave topics with no structural constraint, a stack is too rigid.

The Dialogue Transformer architecture argues for using transformer self-attention as a more flexible alternative. Rather than explicit topic management with push/pop operations, the attention mechanism can attend to any previous turn in the conversation regardless of structural position. This naturally supports topic revisitation without the context loss that stacks impose.

This connects to the multi-turn conversation failure mode. Since Why do language models fail in gradually revealed conversations?, one mechanism of getting lost is losing access to earlier conversation context when topics shift and return. The stack metaphor makes this loss explicit and structural; transformer attention should prevent it in principle, though in practice attention patterns may still favor recent context.

Inquiring lines that read this note 14

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

Why do language models struggle with implicit discourse relations?

What happens to anaphoric reference when context exceeds the window?

How can language models sustain linguistic synchrony and intersubjectivity during dialogue?

Why do conversational queries drift away from what triggered them?

How should dialogue recommender systems manage conversation history and state?

How do formal dialogue structures reveal conversation coherence mechanisms?

Why do language models reinforce false assumptions instead of correcting them?

Why do language models fail when users switch between and return to topics?

What makes dialogue-based explanation more successful than monologue?

How does the Question Under Discussion shape what content projects?

Why do multi-turn conversations degrade AI intent and coherence?

Related concepts in this collection 7

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

13 direct connections · 88 in 2-hop network ·medium cluster Open in graph ↗

Why do dialogue systems lose context when topics… What three layers must discourse systems actually … Why do language models fail in gradually revealed … How do readers track segments, purposes, and salie… What six problems must every conversation solve? Why do language models engage with conversational … Does including all conversation history actually h… Why do users drift away from their original inform…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

What three layers must discourse systems actually track? Grosz and Sidner's 1986 framework proposes that discourse requires simultaneously tracking linguistic segments, speaker purposes, and salient objects. Understanding why all three are necessary helps explain where current AI systems structurally fail.
Grosz & Sidner's framework; the attentional component is what stacks attempt to manage
Why do language models fail in gradually revealed conversations? Explores why LLMs perform 39% worse when instructions arrive incrementally rather than upfront, and whether they can recover from early mistakes in multi-turn dialogue.
topic revisitation failure is a specific mechanism of getting lost
How do readers track segments, purposes, and salience together? Can discourse processing actually happen in parallel rather than sequentially? This matters because understanding how readers coordinate multiple layers of meaning at once reveals where AI systems break down in comprehension.
flexible topic management is required FOR coherence tracking
What six problems must every conversation solve? Schegloff's Conversation Analysis identifies six universal organizational challenges that speakers navigate in all talk-in-interaction. Understanding these helps explain why current AI dialogue systems fall short of human fluency.
topic management is a specific instantiation of Schegloff's "overall structural organization" generic order; the stack-vs-attention debate is about how to solve this particular organizational problem
Why do language models engage with conversational distractors? Explores why state-of-the-art LLMs struggle to maintain topical focus when users introduce off-topic turns, despite having explicit scope instructions. This gap suggests models lack training signals for ignoring irrelevant directions.
complementary aspects of topic structure: topic-following resists LEAVING appropriate topics; topic management handles RETURNING to previous topics; together they define the full problem space of conversational topic continuity
Does including all conversation history actually help retrieval? Conversational search systems typically use all previous context to understand current queries. But do topic switches in multi-turn conversations inject noise that degrades performance rather than helps it?
selective history is the retrieval-side implementation of flexible topic management: rather than rigid stack structures, it dynamically identifies which prior conversation turns are relevant to the current query, enabling effective topic revisitation without context contamination from intervening topic switches
Why do users drift away from their original information need? When users know their knowledge is incomplete but cannot articulate what's missing, do they unintentionally shift topics? And can real-time systems detect this drift?
ASK explains WHY topics shift unintentionally: users in anomalous knowledge states drift into sub-topics without awareness, creating the very topic switches that flexible revisitation structures must accommodate

Why do dialogue systems lose context when topics return?

Inquiring lines that read this note 14

Related concepts in this collection 7

Related papers in this collection 8

Search by related questions 4