Why do large language models produce generic responses to vague queries?
When users fail to specify contextual details in prompts, do LLMs collapse multiple training contexts into a single generic response? Understanding this failure mode could improve how we scaffold user-model interaction.
Context collapse as introduced by Meyrowitz and elaborated by danah boyd describes how electronic media merge previously separated audiences into a single communicative context, forcing speakers to adopt one register that satisfies none. Stokely Carmichael's Black-audience rhetoric became universally audible once broadcast to TV and radio, and he had to choose. The same dynamic appears on social media: posts persist, replicate, and reach audiences the speaker never intended.
Kasirzadeh and Gabriel argue that LLM conversation produces a different form of context collapse. The collapse is not from audience merging — there is one user — but from inadequate scaffolding plus model defaulting. When a user asks for advice on a "work conflict" without specifying their industry, the model cannot infer situational boundaries, so it blends training-data priors from corporate, academic, and gig-economy contexts into a single generic response. The collapse happens between the contexts the model was trained on, not between the user's actual audiences.
This distinction matters because it locates the failure differently. Social-media context collapse is a property of the platform and its visibility settings. LLM context collapse is a property of the user-model interface: the user's mistaken expectation that the model possesses human-like pragmatic capacities to infer situation, plus the model's training-data-driven default when those expectations are not met. Mitigations differ accordingly. Social-media remedies focus on audience controls; LLM remedies focus on context verification, query-back protocols, and user-driven scaffolding tools.
Inquiring lines that use this note as a source 27
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- How do training-data priors influence model defaults when context is ambiguous?
- Should LLMs query users back when presented with under-specified scenarios?
- How does context collapse affect what language models can meaningfully communicate?
- Why does removing language from its context destroy what makes it work?
- How do fixed pragmatic templates prevent models from understanding context?
- Why do NLP benchmarks systematically exclude ambiguous test cases from evaluation?
- Why do NLP benchmarks exclude ambiguous instances from evaluation?
- Why do standard RAG systems struggle with pronouns and demonstratives?
- Why do LLMs produce semantically acceptable but pragmatically disengaged responses?
- Why do pretrained retrievers struggle with ambiguous or implicit queries?
- How does tokenization toward corpus mean affect downstream output diversity?
- Why do users rephrase prompts toward median register over specialized phrasing?
- What role does prompt context play in preventing genuine addressee modeling in generation?
- Can we predict when a specific prompt will fail on a given question?
- How does ambiguity detection connect to models' ability to ask clarifying questions?
- Can context windows and RAG actually change what language models generate?
- Why do NLP benchmarks hide LLM failures in ambiguity handling?
- Can prompt engineering and external knowledge bases fix ambiguity recognition failures?
- Why do language models struggle with context-dependent pragmatic interpretation?
- What makes specific-facet questions outperform generic need-rephrasing requests?
- Why do language models prefer certain response styles regardless of what the prompt asks?
- Why does training data not function as a searchable corpus?
- Why do specific clarifying questions outperform rephrased versions of user needs?
- Why do specific clarifying questions outperform generic requests for clarity?
- Why do language models fail at understanding ambiguous or complex requirements?
- Can question-only features replace model uncertainty checks at scale?
- What other pragmatic prompt features have unstable effects?
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Conversational Alignment with Artificial Intelligence in Context
- Large Language Model Reasoning Failures
- The Curse Of Recursion: Training On Generated Data Makes Models Forget
- Intent Mismatch Causes LLMs to Get Lost in Multi-Turn Conversation
- Everything Everywhere All At Once: Llms Can In-context Learn Multiple Tasks In Superposition
- Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity
- Linguistic Calibration of Long-Form Generations
- Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey
Original note title
Context collapse in LLM conversation arises from scaffolding failure not audience flattening