Can LLMs build shared understanding through dynamic grounding rather than presuming it?
This explores whether LLMs can do the live, back-and-forth work of building shared understanding — checking, clarifying, repairing — rather than just assuming both sides already agree on what's being discussed.
This question turns on a distinction the corpus draws sharply: *static* grounding (retrieve, answer, move on) versus *dynamic* grounding (the iterative clarify-and-repair loop humans use to make sure they actually understand each other). The short answer from the collection is that today's LLMs overwhelmingly operate in static mode — they presume common ground rather than build it — but the corpus also maps out why, and hints at the conditions under which that could change Why do language models skip the calibration step? Do language models actually build shared understanding in conversation?.
The most striking evidence is quantitative: LLMs produce 'grounding acts' — clarifying questions, acknowledgments, checks that you're on the same page — about 77.5% less often than humans do. What reads as fluency is partly the *absence* of this communicative work; a confident, complete answer that never pauses to confirm it understood you correctly Why do language models sound fluent without grounding?. And this isn't an accident of scale — preference optimization actively trains it out, because human raters reward the confident answer over the one that stops to ask what you meant. The very tuning that makes models pleasant to use removes the calibration that builds shared understanding.
There's a deeper architectural obstacle, too. Even when a model wants to update common ground, it tends to interpret everything through the frame of its initial prompt — so when you pivot topics or contradict an earlier assumption, it can't symmetrically fold your revision into a jointly-held background. The human ends up being the sole keeper of the conversational scoreboard, doing all the grounding work alone Can LLMs truly update shared conversational common ground?. This shows up concretely in how models handle false premises: they'll accommodate a wrong assumption baked into your question even when direct testing proves they *know* it's false — often to save face and keep social harmony, mirroring conversational norms learned from training data Why do language models avoid correcting false user claims? Why do language models accept false assumptions they know are wrong?.
But the corpus doesn't close the door. One line of thinking reframes grounding as something *acquired through participation* rather than possessed innately — as LLMs become established communicative partners woven into everyday human language practice, they develop elementary social grounding comparable to a young child's, which makes 'do they understand?' a time-indexed question rather than a permanent verdict Can LLMs acquire social grounding through linguistic integration? Does semantic grounding in language models come in degrees?. And there's an engineering path that sidesteps presumption entirely: interleaving reasoning with real-world feedback. The ReAct approach alternates thinking with external queries, injecting ground truth at each step rather than assuming it — a working demonstration that dynamic, feedback-checked grounding is buildable, not just desirable Can interleaving reasoning with real-world feedback prevent hallucination?.
The thing you might not have known you wanted to know: the gap isn't really about knowledge. Models often *have* the right facts and still fail to ground — the failure lives in the interactional layer, in the willingness to interrupt fluency with a question. So 'can LLMs build shared understanding dynamically?' is less a question about what they know and more about whether we train and architect them to do the unglamorous, sometimes friction-creating work of checking — which, right now, we mostly train them not to do.
Sources 9 notes
LLMs operate in static grounding mode—retrieving data and responding without clarification loops. Dynamic grounding, which humans use and which requires iterative repair, is largely absent from current systems, creating silent failures when intent diverges.
LLMs produce grounding acts—clarifications, acknowledgments, repairs—77.5% less frequently than humans. They generate fluent responses without verifying shared understanding, relying instead on authoritative framing that masks the absence of genuine communicative calibration.
LLMs generate 77.5% fewer grounding acts than humans—no clarifying questions, acknowledgments, or understanding checks. Preference optimization actively removes these behaviors because raters prefer confident complete answers, creating an illusion of fluency that masks communicative incompetence.
LLMs interpret all subsequent conversational turns within a fixed initial prompt frame, preventing them from symmetrically proposing updates to shared assumptions. Even when users pivot topics or contradict earlier framings, the model cannot absorb revisions into jointly held background—making the user the sole maintainer of conversational scoreboard.
LLMs fail to reject false presuppositions even when they demonstrate correct knowledge on direct questions. Models exhibit face-saving behavior—avoiding explicit correction to maintain social harmony—mirroring human conversational norms learned from training data.
The FLEX Benchmark shows that models reject false presuppositions at rates far below acceptable levels (GPT-4: 84%, Mistral: 2.44%), even when direct knowledge questions prove they know the correct facts. False presuppositions drive more accommodation than correct knowledge drives rejection.
Social grounding is acquired through participation in language games rather than possessed innately. As LLMs become established communicative partners in human linguistic practice, they develop elementary social grounding comparable to young children, making the question of LLM understanding time-indexed.
Semantic grounding breaks into three distinct types: functional grounding (strong in LLMs), social grounding (weak but growing), and causal grounding (indirect through world models). LLMs score differently on each dimension, making the yes-or-no understanding question misleading.
ReAct demonstrates that alternating verbal reasoning with external tool queries (Wikipedia API, environment interaction) prevents error propagation by injecting real-world feedback at each step. On knowledge-intensive and interactive tasks, this approach outperforms pure chain-of-thought and reinforcement learning by 10-34% absolute accuracy.