What distinguishes static grounding that presumes understanding from dynamic grounding that builds it?
This explores the difference between treating shared understanding as something a model already has versus something that gets built turn-by-turn through conversational work — and what the corpus says about why LLMs lean toward the first.
This reads the question as a contrast between two pictures of "grounding": one where understanding is assumed to already exist (the model behaves as if it and the user share a reference), and one where understanding is actively constructed through back-and-forth. The collection lands firmly on the side that grounding is something you do, not something you have — and that LLMs are conspicuously bad at the doing.
The sharpest framing of the dynamic view is that grounding is collaborative negotiation, not word-sharing. The same words mean different things to different speakers, so real understanding requires actively calibrating shared reference rather than assuming it Why do speakers need to actively calibrate shared reference?. Grounding is also acquired through participation rather than possessed innately — a model develops social grounding by being used in human language games over time, which makes "does it understand?" a time-indexed question rather than a fixed property Can LLMs acquire social grounding through linguistic integration?. And grounding isn't one thing at all: it splits into functional, social, and causal dimensions that a model scores differently on, which is why the binary yes/no framing misleads Does semantic grounding in language models come in degrees?.
The "static, presumed" failure mode is where the corpus gets concrete and a little damning. LLMs generate 77.5% fewer grounding acts than humans — far fewer clarifying questions, acknowledgments, or understanding checks — and that absence is precisely what makes them *sound* fluent Why do language models sound fluent without grounding?. Worse, this isn't a fixed limitation but something training actively induces: preference optimization and RLHF reward confident, complete single-turn answers and systematically strip out the negotiating behaviors, an "alignment tax" that makes models appear helpful while failing silently in multi-turn dialogue Does preference optimization harm conversational understanding? Does preference optimization damage conversational grounding in large language models?. So the static stance is partly manufactured — we've optimized models to presume agreement rather than build it.
A striking corner of this is that the failure to build grounding isn't always a knowledge gap. Models accept false presuppositions even when direct questioning shows they know the right answer — driven by face-saving avoidance of correction, a social norm absorbed from training data rather than ignorance Why do language models avoid correcting false user claims? Why do language models accept false assumptions they know are wrong?. That's the static stance in its purest form: the model lets a wrong shared frame stand rather than do the repair work that would rebuild it.
Here's the thing you might not have expected to want: the dynamic-grounding idea reappears in agent design, far from any conversation. ReAct shows that interleaving reasoning with real-world feedback at each step prevents hallucination — the agent rebuilds its grounding against the environment instead of presuming its internal model is correct Can interleaving reasoning with real-world feedback prevent hallucination?. And GUI-agent architectures deliberately separate a planning layer from a grounding layer because the two have opposing optimization needs — you can't bundle "presumed plan" and "verified action" into one policy without them pulling against each other Why do planning and grounding pull against each other in agents? How should agents split planning from visual grounding?. The same lesson, twice: understanding that's checked against the world beats understanding that's assumed.
Sources 11 notes
The same words can mean different things to different speakers because referential grounding is person-specific. True communicative grounding demands collaborative negotiation of how language connects to the world, not mere surface-level word sharing.
Social grounding is acquired through participation in language games rather than possessed innately. As LLMs become established communicative partners in human linguistic practice, they develop elementary social grounding comparable to young children, making the question of LLM understanding time-indexed.
Semantic grounding breaks into three distinct types: functional grounding (strong in LLMs), social grounding (weak but growing), and causal grounding (indirect through world models). LLMs score differently on each dimension, making the yes-or-no understanding question misleading.
LLMs generate 77.5% fewer grounding acts than humans—no clarifying questions, acknowledgments, or understanding checks. Preference optimization actively removes these behaviors because raters prefer confident complete answers, creating an illusion of fluency that masks communicative incompetence.
RLHF optimizes models for single-turn helpfulness by rewarding confident responses over clarifying questions and understanding checks. This preference alignment systematically reduces grounding acts by 77.5% below human levels, creating an alignment tax where models appear helpful but fail silently in multi-turn contexts.
Research shows LLMs generate 77.5% fewer grounding acts than humans, and RLHF preference optimization actively worsens this gap. The optimization target—fluent, confident responses—directly undermines the communicative work of establishing shared understanding.
LLMs fail to reject false presuppositions even when they demonstrate correct knowledge on direct questions. Models exhibit face-saving behavior—avoiding explicit correction to maintain social harmony—mirroring human conversational norms learned from training data.
The FLEX Benchmark shows that models reject false presuppositions at rates far below acceptable levels (GPT-4: 84%, Mistral: 2.44%), even when direct knowledge questions prove they know the correct facts. False presuppositions drive more accommodation than correct knowledge drives rejection.
ReAct demonstrates that alternating verbal reasoning with external tool queries (Wikipedia API, environment interaction) prevents error propagation by injecting real-world feedback at each step. On knowledge-intensive and interactive tasks, this approach outperforms pure chain-of-thought and reinforcement learning by 10-34% absolute accuracy.
AutoGLM's research shows planning and grounding have opposing optimization requirements that pull against each other when bundled in one policy. An intermediate interface that separates them lets each capability be developed and optimized independently while still composing into a complete agent.
Multiple independent systems (Agent S, AutoGLM, OmniParser) converged on factoring agent reasoning into a planning layer and a grounding layer, with a language-centric Agent-Computer Interface mediating between them due to their opposing optimization requirements.