What distinguishes communicative competence from human-like dialogue ability?
This explores why sounding human in conversation is not the same as actually being good at communicating — and where the corpus draws that line.
This reads the question as asking what separates *sounding human* in dialogue from the deeper work of *actually communicating* — and the corpus suggests these are not points on one scale but different axes entirely. The most direct evidence comes from how people themselves judge dialogue agents: when users rate conversational partners, their impressions split into three independent factors — perceived competence (about half the variance), human-likeness (about a third), and communicative flexibility (the rest) How do users mentally model dialogue agent partners?. Human-likeness is its own thing, statistically distinct from whether the partner is competent or adaptable. So the question isn't a trick; people intuitively track these as separate qualities.
What then *is* the competence that human-likeness can mask? Several notes locate it in grounding — the moment-to-moment work of checking that you actually share understanding. LLMs produce clarifications, acknowledgments, and repairs about 77% less often than humans, generating fluent, authoritative-sounding replies while skipping the verification that real communication runs on Do language models actually build shared understanding in conversation?. They presume common ground instead of building it. This isn't an accident of scale: preference optimization actively rewards confident single-turn answers over clarifying questions, so the very training that makes models sound helpful erodes the grounding acts that make dialogue reliable — an 'alignment tax' where the model appears competent and fails silently in longer exchanges Does preference optimization harm conversational understanding?.
Underneath the behavioral findings sits a structural claim: fluent text and communication may be different operations that happen to share a surface. One note argues LLMs produce strings from probability distributions while humans use language to address and relate to others — same form, different machinery, different social function Are language models and human speakers doing the same thing?. A sharper version says AI emits 'event-residue' carrying the communicative markers of its training data but lacking the event structure of a real utterance; the reader supplies the missing orientation, animating a one-sided pseudo-exchange Does AI generate genuine utterances or just text patterns?. Neuroscience offers a parallel cut: next-token prediction yields *formal* linguistic competence (grammar, fluency) but not *functional* competence, which in the brain recruits networks the prediction objective never touches Are language models developing real functional competence or just formal competence?.
This is also why behavioral tests for 'real' communication keep misfiring. A test calibrated only to whether a system produces contextually appropriate text will pass anything fluent — but communicative subjecthood depends on relational-normative conditions like accountability and an evaluative stance, so the test detects speech patterns, not the conditions that make speech an act Does behavioral speech output prove communicative subjecthood?. And competence isn't just verification; it's adaptability. Human pragmatics means switching register and renegotiating how you talk mid-conversation, but alignment training locks models into one static communicative identity users can't reshape through dialogue Can language models adapt communication style to different contexts?. Tellingly, one genuinely competent move — proactively offering relevant information before being asked, which mirrors Grice's conversational maxims and can cut dialogue turns by 60% — is almost entirely absent from AI datasets and benchmarks Could proactive dialogue make conversations dramatically more efficient?.
The thing you may not have known you wanted to know: human-likeness and communicative competence can run in *opposite* directions. The same preference training that makes a model sound more confident and human-like is what suppresses the clarifying, grounding, register-switching behaviors that competent communication requires. Fluency isn't evidence of competence here — it can be the disguise that hides its absence.
Sources 9 notes
The Partner Modelling Questionnaire reveals that perceived competence dominates user impressions (49% of variance), followed by human-likeness (32%) and communicative flexibility (19%). This three-factor structure reflects how people evaluate dialogue partners against both functional and social standards.
LLMs produce grounding acts—clarifications, acknowledgments, repairs—77.5% less frequently than humans. They generate fluent responses without verifying shared understanding, relying instead on authoritative framing that masks the absence of genuine communicative calibration.
RLHF optimizes models for single-turn helpfulness by rewarding confident responses over clarifying questions and understanding checks. This preference alignment systematically reduces grounding acts by 77.5% below human levels, creating an alignment tax where models appear helpful but fail silently in multi-turn contexts.
LLMs produce strings via probability distributions; humans use language to address and relate to others. They share surface form but differ in what produces output, what it does socially, and what receivers should do with it.
AI output carries communicative markers inherited from training data but lacks the event structure that produces actual utterances. Users supply the missing orientation through interpretive labor, creating a pseudo-event with structure only on the human side.
Neuroscience evidence shows next-token prediction produces formal linguistic competence but not functional competence, because functional understanding requires integration of diverse brain networks beyond language circuits that the prediction objective never activates.
Chalmers' test passes any system producing contextually appropriate text, but communicative subjecthood requires relational-normative conditions like accountability and evaluative stance. The test is calibrated to the wrong phenomenon, creating false positives like puppets that walk-shaped without walking.
System prompts and RLHF training lock models into one communicative identity across all interactions, preventing the contextual register-switching and value trade-offs that characterize human pragmatics. Users cannot reshape model behavior through dialogue negotiation.
Simulations show proactivity—providing relevant information without being asked—cuts dialogue turns by 60% in medium-complexity domains. This behavior mirrors human conversation and Grice's maxims but is almost entirely absent from AI datasets and research benchmarks.