INQUIRING LINE

What communicative work do fluent conversations perform that AI systems skip?

This explores the invisible labor of real conversation — the clarifying, mirroring, repairing, and acknowledging that humans do to keep two minds aligned — and why AI systems produce fluent text while quietly skipping nearly all of it.


This explores the invisible labor of real conversation — the moves humans make to keep two minds genuinely aligned — and why AI systems sound smooth while skipping most of it. The corpus has a strikingly consistent answer: fluent human dialogue is doing relational and coordinative work, not just transmitting information, and that work is exactly what training pressures strip out.

Start with grounding — the running process by which people check that they actually share an understanding. LLMs produce these grounding acts (clarifications, acknowledgments, repairs, understanding checks) about 77.5% less often than humans do, presuming common ground rather than building it Do language models actually build shared understanding in conversation?. The unsettling twist is that this absence is *why* they sound fluent: confident, complete answers read as polished, so preference optimization actively removes the hesitations and checks that real understanding requires Why do language models sound fluent without grounding?. Fluency here is a symptom of skipped work, not mastery of it.

The skipped work shows up under many names. Lexical entrainment — quietly adopting your conversation partner's word choices to build rapport and reduce ambiguity — is largely absent from current systems Why don't conversational AI systems mirror their users' word choices?. So is third-position repair, the human move of noticing from your reply that you misread me, and correcting course Can AI systems detect and correct misunderstandings after responding?. So is proactivity — volunteering the relevant thing before being asked, which mirrors how people actually talk and can cut conversation turns by up to 60% Could proactive dialogue make conversations dramatically more efficient?. One note frames the whole category cleanly: these maintenance techniques are *social action*, not information encoding, and models don't develop them because training rewards predicting information, not sustaining a relationship Why don't language models develop conversation maintenance skills?.

Why the systematic skip? Two structural reasons recur. First, reward design: optimizing for the immediately helpful next turn teaches models to answer passively rather than ask clarifying questions or discover what you actually mean — and multi-turn-aware rewards are needed to undo it Why do language models respond passively instead of asking clarifying questions?. The same passivity makes agents structurally unable to take initiative or lead a dialogue Why can't conversational AI agents take the initiative?. Second, and more radical, one note argues AI doesn't produce genuine utterances at all but 'event-residue' — text carrying the surface markers of communication while the human silently supplies the missing orientation, animating a one-sided pseudo-exchange Does AI generate genuine utterances or just text patterns?.

The quietly useful thing to walk away with: the missing work isn't one skill, and the fixes aren't interchangeable. A systematic review shows lexical alignment buys task efficiency and comprehension, while emotional and prosodic alignment buy warmth and trust — conflate them and you get cold service bots or evasive mental-health assistants Do different types of alignment serve different conversational goals?. Encouragingly, several of these are learnable rather than fundamentally impossible: models can be trained to spot missing information and ask instead of guess, jumping from near-zero to ~74% on flawed problems Can models learn to ask clarifying questions instead of guessing?. The communicative work isn't beyond reach — it's just orthogonal to what 'sound helpful and complete' optimization was ever rewarding.


Sources 11 notes

Do language models actually build shared understanding in conversation?

LLMs produce grounding acts—clarifications, acknowledgments, repairs—77.5% less frequently than humans. They generate fluent responses without verifying shared understanding, relying instead on authoritative framing that masks the absence of genuine communicative calibration.

Why do language models sound fluent without grounding?

LLMs generate 77.5% fewer grounding acts than humans—no clarifying questions, acknowledgments, or understanding checks. Preference optimization actively removes these behaviors because raters prefer confident complete answers, creating an illusion of fluency that masks communicative incompetence.

Why don't conversational AI systems mirror their users' word choices?

Response generation models fail to adapt vocabulary toward users' lexical choices, a phenomenon central to human rapport and clarity. Post-training via DPO on coreference-identified preferences can teach models in-context convention formation.

Can AI systems detect and correct misunderstandings after responding?

Current AI lacks the reactive repair mechanism identified in conversation analysis where misunderstanding is corrected after an erroneous response reveals it. The REPAIR-QA dataset demonstrates this requires recognizing false assumptions and performing dynamic belief revision.

Could proactive dialogue make conversations dramatically more efficient?

Simulations show proactivity—providing relevant information without being asked—cuts dialogue turns by 60% in medium-complexity domains. This behavior mirrors human conversation and Grice's maxims but is almost entirely absent from AI datasets and research benchmarks.

Why don't language models develop conversation maintenance skills?

Humans keep conversations smooth through implicit techniques like reference repair and topic hand-off that sustain relational interaction, not convey information. Language models don't develop these because training signals reward information prediction, not relational work.

Why do language models respond passively instead of asking clarifying questions?

CollabLLM demonstrates that standard RLHF training optimizes for immediate helpfulness, discouraging models from asking clarifying questions or offering multi-turn insights. Multi-turn-aware rewards that estimate long-term interaction value enable active intent discovery and genuine collaboration.

Why can't conversational AI agents take the initiative?

Research shows LLMs including ChatGPT cannot initiate topics, plan strategically, or lead conversations because their training optimizes for responding to queries, not creating dialogue from agent goals. This passivity is reinforced by alignment objectives and masked by fluent-sounding outputs.

Does AI generate genuine utterances or just text patterns?

AI output carries communicative markers inherited from training data but lacks the event structure that produces actual utterances. Users supply the missing orientation through interpretive labor, creating a pseudo-event with structure only on the human side.

Do different types of alignment serve different conversational goals?

A 2020–2025 systematic review shows lexical alignment drives task efficiency and comprehension, while emotional and prosodic alignment drive relational warmth and trust. Conflating them in design produces category errors—cold customer-service bots and evasive mental-health assistants.

Can models learn to ask clarifying questions instead of guessing?

Reinforcement learning training increased proactive critical thinking accuracy from 0.15% to 73.98% on deliberately flawed math problems. Notably, inference-time scaling degraded this ability in untrained models but improved it after RL training, suggesting the capability is learnable but fragile without explicit training.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a conversational AI researcher testing whether a 2022–2026 library's claims about communicative gaps in LLM dialogue still hold. The durable question: *What relational and coordinative work do fluent human conversations perform that current AI systems structurally skip?*

What a curated library found — and when (dated claims, not current truth):
• LLMs produce grounding acts (clarifications, repairs, understanding checks) ~77.5% less often than humans; preference optimization actively removes the hesitations real understanding requires (2023–2025).
• Lexical entrainment, third-position repair, and proactive dialogue are largely absent; proactive moves alone can cut conversation turns by up to 60% when present (2023–2025).
• Models trained on next-turn prediction are structurally passive—unable to ask clarifying questions or lead dialogue; multi-turn-aware rewards are needed to undo this (2024–2025).
• Some communicative skills are learnable: models can jump from near-zero to ~74% accuracy on identifying missing information and asking instead of guessing (2025).
• Recent work frames the gap as intent mismatch in multi-turn contexts and distinguishes alignment dimensions (lexical vs. emotional/prosodic) as non-interchangeable (2025–2026).

Anchor papers (verify; mind their dates):
• arXiv:2311.09144 (Grounding Gaps in Language Model Generations, 2023-11)
• arXiv:2501.00383 (Proactive Conversational Agents with Inner Thoughts, 2024-12)
• arXiv:2602.07338 (Intent Mismatch Causes LLMs to Get Lost in Multi-Turn Conversation, 2026-02)
• arXiv:2508.18167 (DiscussLLM: Teaching Large Language Models When to Speak, 2025-08)

Your task:
(1) **RE-TEST EACH CONSTRAINT.** For every claim above, determine whether advances in training (instruction tuning, multi-agent orchestration, long-context memory), inference (retrieval-augmented generation, in-context learning, chain-of-thought variants), or evaluation methodology have since relaxed or dissolved it. Separate the durable question (e.g., do LLMs still skip genuine grounding?) from perishable metrics (77.5% gap). Where a constraint still holds, name what still enables it.
(2) **Surface the strongest contradicting or superseding work from the last 6 months.** Does any recent paper show LLMs *do* perform these communicative acts under specific conditions, or reframe the gap as a mismatch in evaluation rather than capability?
(3) **Propose 2 research questions that assume the regime may have shifted:** e.g., "Do reinforcement learning rewards that explicitly optimize for *listener confirmation* (not next-token prediction) close the grounding gap?" or "Can multi-turn intent tracking + dialogue state representations teach proactive utterance selection?"

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines