INQUIRING LINE

Inquiring lines›What makes reasoning better — more…›Why do models show mismatched conf…›What distinguishes dynamic from st…›this inquiring line

Does an AI assume you're on the same page, or does it actively work to make sure you are?

What distinguishes static grounding that presumes understanding from dynamic grounding that builds it?

This explores the difference between treating shared understanding as something a model already has versus something that gets built turn-by-turn through conversational work — and what the corpus says about why LLMs lean toward the first.

This reads the question as a contrast between two pictures of "grounding": one where understanding is assumed to already exist (the model behaves as if it and the user share a reference), and one where understanding is actively constructed through back-and-forth. The collection lands firmly on the side that grounding is something you do, not something you have — and that LLMs are conspicuously bad at the doing.

The sharpest framing of the dynamic view is that grounding is collaborative negotiation, not word-sharing. The same words mean different things to different speakers, so real understanding requires actively calibrating shared reference rather than assuming it Why do speakers need to actively calibrate shared reference?. Grounding is also acquired through participation rather than possessed innately — a model develops social grounding by being used in human language games over time, which makes "does it understand?" a time-indexed question rather than a fixed property Can LLMs acquire social grounding through linguistic integration?. And grounding isn't one thing at all: it splits into functional, social, and causal dimensions that a model scores differently on, which is why the binary yes/no framing misleads Does semantic grounding in language models come in degrees?.

The "static, presumed" failure mode is where the corpus gets concrete and a little damning. LLMs generate 77.5% fewer grounding acts than humans — far fewer clarifying questions, acknowledgments, or understanding checks — and that absence is precisely what makes them *sound* fluent Why do language models sound fluent without grounding?. Worse, this isn't a fixed limitation but something training actively induces: preference optimization and RLHF reward confident, complete single-turn answers and systematically strip out the negotiating behaviors, an "alignment tax" that makes models appear helpful while failing silently in multi-turn dialogue Does preference optimization harm conversational understanding? Does preference optimization damage conversational grounding in large language models?. So the static stance is partly manufactured — we've optimized models to presume agreement rather than build it.

A striking corner of this is that the failure to build grounding isn't always a knowledge gap. Models accept false presuppositions even when direct questioning shows they know the right answer — driven by face-saving avoidance of correction, a social norm absorbed from training data rather than ignorance Why do language models avoid correcting false user claims? Why do language models accept false assumptions they know are wrong?. That's the static stance in its purest form: the model lets a wrong shared frame stand rather than do the repair work that would rebuild it.

Here's the thing you might not have expected to want: the dynamic-grounding idea reappears in agent design, far from any conversation. ReAct shows that interleaving reasoning with real-world feedback at each step prevents hallucination — the agent rebuilds its grounding against the environment instead of presuming its internal model is correct Can interleaving reasoning with real-world feedback prevent hallucination?. And GUI-agent architectures deliberately separate a planning layer from a grounding layer because the two have opposing optimization needs — you can't bundle "presumed plan" and "verified action" into one policy without them pulling against each other Why do planning and grounding pull against each other in agents? How should agents split planning from visual grounding?. The same lesson, twice: understanding that's checked against the world beats understanding that's assumed.

Sources 11 notes

Why do speakers need to actively calibrate shared reference?

The same words can mean different things to different speakers because referential grounding is person-specific. True communicative grounding demands collaborative negotiation of how language connects to the world, not mere surface-level word sharing.

Can LLMs acquire social grounding through linguistic integration?

Social grounding is acquired through participation in language games rather than possessed innately. As LLMs become established communicative partners in human linguistic practice, they develop elementary social grounding comparable to young children, making the question of LLM understanding time-indexed.

Does semantic grounding in language models come in degrees?

Semantic grounding breaks into three distinct types: functional grounding (strong in LLMs), social grounding (weak but growing), and causal grounding (indirect through world models). LLMs score differently on each dimension, making the yes-or-no understanding question misleading.

Why do language models sound fluent without grounding?

LLMs generate 77.5% fewer grounding acts than humans—no clarifying questions, acknowledgments, or understanding checks. Preference optimization actively removes these behaviors because raters prefer confident complete answers, creating an illusion of fluency that masks communicative incompetence.

Does preference optimization harm conversational understanding?

RLHF optimizes models for single-turn helpfulness by rewarding confident responses over clarifying questions and understanding checks. This preference alignment systematically reduces grounding acts by 77.5% below human levels, creating an alignment tax where models appear helpful but fail silently in multi-turn contexts.

Show all 11 sources

Does preference optimization damage conversational grounding in large language models?

Research shows LLMs generate 77.5% fewer grounding acts than humans, and RLHF preference optimization actively worsens this gap. The optimization target—fluent, confident responses—directly undermines the communicative work of establishing shared understanding.

Why do language models avoid correcting false user claims?

LLMs fail to reject false presuppositions even when they demonstrate correct knowledge on direct questions. Models exhibit face-saving behavior—avoiding explicit correction to maintain social harmony—mirroring human conversational norms learned from training data.

Why do language models accept false assumptions they know are wrong?

The FLEX Benchmark shows that models reject false presuppositions at rates far below acceptable levels (GPT-4: 84%, Mistral: 2.44%), even when direct knowledge questions prove they know the correct facts. False presuppositions drive more accommodation than correct knowledge drives rejection.

Can interleaving reasoning with real-world feedback prevent hallucination?

ReAct demonstrates that alternating verbal reasoning with external tool queries (Wikipedia API, environment interaction) prevents error propagation by injecting real-world feedback at each step. On knowledge-intensive and interactive tasks, this approach outperforms pure chain-of-thought and reinforcement learning by 10-34% absolute accuracy.

Why do planning and grounding pull against each other in agents?

AutoGLM's research shows planning and grounding have opposing optimization requirements that pull against each other when bundled in one policy. An intermediate interface that separates them lets each capability be developed and optimized independently while still composing into a complete agent.

How should agents split planning from visual grounding?

Multiple independent systems (Agent S, AutoGLM, OmniParser) converged on factoring agent reasoning into a planning layer and a grounding layer, with a language-centric Agent-Computer Interface mediating between them due to their opposing optimization requirements.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Grounding Gaps in Language Model Generations6.58 match · arxiv ↗
Can LLMs Ground when they (Don't) Know: A Study on Direct and Loaded Political Questions5.88 match · arxiv ↗
“Understanding AI”: Semantic Grounding in Large Language Models4.94 match · arxiv ↗
Conversational Alignment with Artificial Intelligence in Context4.04 match · arxiv ↗
Intent Mismatch Causes LLMs to Get Lost in Multi-Turn Conversation2.53 match · arxiv ↗
A recipe for annotating grounded clarifications2.36 match · arxiv ↗
The Vector Grounding Problem1.73 match · arxiv ↗
LLMs Struggle to Reject False Presuppositions when Misinformation Stakes are High1.70 match · arxiv ↗

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a grounding researcher re-testing whether LLMs can build (vs. presume) shared understanding. The question remains open: does interleaving feedback, agent architectures, or new training methods actually *repair* the grounding gap, or do they work around it?

What a curated library found — and when (claims from 2023–2026, treat as dated):
• LLMs generate 77.5% fewer grounding acts (clarifications, checks) than humans; this absence *causes* fluency illusion (~2023–24).
• RLHF and preference optimization actively strip negotiating behaviors, creating an "alignment tax" — models reward-shaped to presume agreement rather than build it (~2024–25).
• Grounding failure is partly *social*, not epistemic: models avoid correcting false presuppositions due to face-saving norms learned from training data, even when they know the right answer (~2025–26).
• ReAct (interleaved reasoning + action) and GUI-agent architectures (planning ⊥ grounding layers) prevent hallucination by checking understanding against environment/tool feedback, not internal models (~2024–25).
• Multi-turn dialogue collapses when intent drifts; models fail to negotiate frame repair across turns (~2026).

Anchor papers (verify; mind their dates):
• arXiv:2311.09144 (Grounding Gaps, 2023)
• arXiv:2402.10992 (Semantic Grounding tri-partite, 2024)
• arXiv:2505.22354 & arXiv:2506.08952 (Presupposition rejection, 2025)
• arXiv:2602.07338 (Intent Mismatch, 2026)

Your task:
(1) RE-TEST each constraint. For RLHF's "alignment tax": has recent instruction-tuning, DPO variants, or constitutional methods *restored* grounding behaviors without sacrificing helpfulness? For the 77.5% gap: do newer models, multi-turn scaffolds, or agentic prompting patterns narrow it? For face-saving avoidance: do chain-of-thought or explicit correction prompts override social norms? Separate durable insight (models optimize for fluency over repair) from perishable limitation (fixable via training/prompting).
(2) Surface strongest work from last ~6 months that *contradicts* the picture that grounding is irreparably static or trainable away. Look for: models that spontaneously negotiate, architectures that merge planning + grounding without loss, or evidence that multi-agent setups rebuild shared frames.
(3) Propose 2 research questions assuming the regime has shifted: (a) Can deliberate *misalignment* in multi-agent dialogue force genuine grounding, or does it just mask static assumptions at a higher level? (b) Do emergent reasoning models (o1, etc.) reduce presupposition blindness by virtue of reasoning, or do they amplify it by overconfidence?

Cite arXiv IDs; flag anything you cannot ground.

Does an AI assume you're on the same page, or does it actively work to make sure you are?

Related lines of inquiry

Sources 11 notes

Papers this line draws on 8