Why do language models sound fluent without grounding?
Explores whether LLM fluency masks the absence of communicative work—the clarifying questions, acknowledgments, and understanding checks that humans perform. Why does skipping these acts make models sound more confident?
Post angle: The most counterintuitive finding about LLM conversational competence is not that they fail — it's the specific way they fail. LLMs generate 77.5% fewer grounding acts than humans in equivalent contexts. They don't ask clarifying questions. They don't acknowledge understanding. They don't check interpretations. They proceed.
The irony: this absence contributes to the impression of fluency. Clarifying questions interrupt flow. Acknowledgments add friction. Checking understanding is a kind of epistemic humility that confident answers don't perform. A model that never expresses uncertainty, never asks "do you mean X or Y?", never says "just to confirm I understand correctly" — sounds authoritative.
But what sounds like confidence is partly the absence of competence. Human conversational experts ask more questions, acknowledge more, repair more — not because they know less but because they know enough to know when mutual understanding needs to be verified.
The Grounding Gaps finding reveals that preference optimization (RLHF) actively erodes this behavior. Human raters prefer confident, fluent, complete answers over those with clarifying questions. So optimization removes the communicative work — and the model gets better ratings for doing less of what conversation actually requires.
Write about: what we call "fluency" may be partly the absence of communicative accountability. The most fluent response is often the one that presumes you understood it.
The observer-systems dimension: The grounding gap has a deeper epistemological layer visible from the perspective of observer systems theory (Bateson, Luhmann). Since Can AI distinguish which differences actually matter?, AI is not merely skipping communicative work — it is not an observer in the first place. Experts ground their communication through observation: they perceive the state of knowledge, the needs of the audience, and the relevance of their own contribution. This observation is communicative work — it is how the expert decides what to say, what to omit, and what to verify. AI generates responses from prompts without observing any state — of knowledge, of the user, of the audience, or of the context. The 77.5% grounding gap quantifies the absence of communicative acts; the observer-systems framing explains why those acts are absent: the generative process that produces AI output is fundamentally non-observational. Fabrication, in this light, is not just the absence of grounding — it is the consequence of generating without observing.
Inquiring lines that use this note as a source 45
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Why are education and language fluency more affected than race perception?
- Why do LLM explanations feel authoritative even when alignment with the model fails?
- Can language models ground clarifications without vision and kinesthetic modalities?
- How do LLMs differ from humans in their grounding mechanisms?
- Why can't static grounding alone close the gap between agreement and understanding?
- How does semantic grounding differ between human minds and language models?
- Can large language models understand language without embodied grounding systems?
- What role does dynamic grounding play in achieving real mutual understanding?
- Why do LLMs produce semantically acceptable but pragmatically disengaged responses?
- How does processing fluency bias credibility and expertise judgments?
- What architectural changes would let language models develop genuine functional competence?
- Can distinctive input voices maintain accuracy without adopting the model's preferred register?
- Why do language models fail at grounding and inference?
- Do LLMs have functional linguistic competence or only formal language ability?
- Can knowledge density explain why LLM writing feels coherent but fatiguing?
- Why do language models presume common ground instead of establishing it?
- Can language models develop genuine social grounding through human interaction?
- Does social grounding differ fundamentally from causal grounding in LLM behavior?
- What distinguishes social grounding from the equivalent social effects LLM text already produces?
- What makes social grounding different from constitutive linguistic agency?
- Can static word-sharing create genuine communicative grounding between humans and models?
- What language capabilities does fluency on standard benchmarks actually measure?
- Why do LLMs presume common ground instead of building it carefully?
- How does face-saving avoidance drive LLM grounding failures?
- How does RLHF training incentivize confident guessing over grounding acts?
- Why does preference optimization reduce grounding behavior in language models?
- What is the difference between static and dynamic grounding in dialogue?
- Why do LLMs presume common ground instead of building it?
- What's the difference between formal and functional linguistic competence?
- Do LLMs build common ground or assume it already exists?
- Can LLMs build shared understanding through dynamic grounding rather than presuming it?
- Can convention formation improve communicative grounding beyond word sharing?
- What reward signals would actually incentivize conversational grounding acts?
- How does Wittgenstein's language games explain social grounding in LLMs?
- How does preference optimization weaken conversational grounding in LLMs?
- How does fluent output mask the mythic function of a system?
- What makes grounding acts essential to conversational reliability?
- What distinguishes static grounding that presumes understanding from dynamic grounding that builds it?
- Do conversational agents need goal awareness to initiate grounding work themselves?
- What communicative work do fluent conversations perform that AI systems skip?
- Can pragmatic competence emerge from text exposure alone without interactive grounding?
- Why do LLMs lack the communicative scaffold that humans learn?
- What distinguishes surface language form from communicative operation?
- Why do language model reasoning chains look fluent when they deviate from the task?
- Why does LLM fluency create false perceptions of professional standing and expertise?
Related concepts in this collection 10
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Do language models actually build shared understanding in conversation?
When LLMs respond fluently to prompts, do they perform the communicative work humans do to establish mutual understanding? Research suggests they skip the grounding acts that make dialogue reliable.
the core finding
-
Does preference optimization damage conversational grounding in large language models?
Exploring whether RLHF and preference optimization actively reduce the communicative acts—clarifications, acknowledgments, confirmations—that build shared understanding in dialogue. This matters for high-stakes applications like medical and emotional support.
why RLHF creates this
-
Why do language models skip the calibration step?
Current LLMs assume shared understanding rather than building it through dialogue. This explores why that design choice persists and what breaks when it fails.
structural framing
-
Why can't conversational AI agents take the initiative?
Explores whether current LLMs lack the structural ability to lead conversations, set goals, or anticipate user needs—and what architectural changes might enable proactive dialogue.
passivity and the grounding gap share a root: training for fluent single-turn responses removes both initiative and communicative work; the absence of grounding acts is what makes passive responses sound authoritative
-
Can models learn to ask clarifying questions instead of guessing?
Exploring whether large language models can be trained to detect incomplete queries and actively request missing information rather than hallucinating answers or refusing to respond. This matters because conversational agents today remain passive, responding only when prompted.
proactive critical thinking is a trainable antidote to the grounding gap: models that learn to ask targeted clarifying questions perform the grounding acts that fluency training removes
-
Can AI systems detect and correct misunderstandings after responding?
How do conversational systems recognize when their previous response was based on a misunderstanding, and what mechanism allows them to correct it retroactively rather than restart?
TPR is a specific form of communicative work that fluent models skip: reactive correction of misunderstanding after acting on it
-
Why do language models fail in gradually revealed conversations?
Explores why LLMs perform 39% worse when instructions arrive incrementally rather than upfront, and whether they can recover from early mistakes in multi-turn dialogue.
the 39% multi-turn degradation is the empirical cost of absent communicative work: models that skip grounding acts lock in to incorrect assumptions and cannot recover
-
Why don't conversational AI systems mirror their users' word choices?
Explores whether current dialogue models exhibit lexical entrainment—the human tendency to align vocabulary with conversation partners—and what's needed to bridge this gap in AI communication.
lexical entrainment is a specific form of communicative work that fluent models eliminate: adapting vocabulary to match the interlocutor builds shared understanding through practice, not just through checking
-
Why can't advanced AI models take initiative in conversation?
Despite extraordinary capability in answering and reasoning, LLMs fundamentally cannot initiate, redirect, or guide exchanges. Understanding this gap—and whether it's fixable—matters for building AI that truly collaborates rather than merely responds.
the grounding gap and the passivity problem are complementary diagnoses: the grounding gap describes the absence of communicative accountability (skipping clarification, acknowledgment, repair); the passivity problem describes the absence of conversational initiative (never leading, redirecting, or planning); both are consequences of single-turn helpfulness training that rewards confident, fluent responses
-
Why do people share more openly with machines than humans?
Does the absence of social goals in human-machine communication explain why people disclose sensitive information more readily to chatbots? Understanding this mechanism could reshape how we design conversational AI.
HMC goal simplification may reframe the grounding gap: when secondary social goals are suppressed, much of the communicative work those goals demand becomes unnecessary; the 77.5% reduction may be partly adaptive for HMC's reduced goal complexity rather than a pure deficit
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Can LLMs Ground when they (Don't) Know: A Study on Direct and Loaded Political Questions
- Grounding Gaps in Language Model Generations
- The LLM Fallacy: Misattribution in AI-Assisted Cognitive Workflows
- Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey
- Conversational Alignment with Artificial Intelligence in Context
- “Understanding AI”: Semantic Grounding in Large Language Models
- The Thin Line Between Comprehension and Persuasion in LLMs
- ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate
Original note title
the grounding gap — what makes llms seem fluent is the absence of communicative work