How do humans learn language through communication differently than LLM text prediction?
This explores the gap between how humans acquire language as a social, intention-sharing activity versus how LLMs learn it as next-token prediction over text form — and what that difference produces.
This question is really about a divide between two kinds of learning: humans grow into language by using it to coordinate with other people, while LLMs reconstruct it by predicting form from form. The corpus draws this line sharply. The strongest framing comes from the argument that meaning lives in the relationship between expressions and communicative intent — so a system trained only on form-to-form prediction, with no access to shared attention or a partner's goals, has no path to reconstruct what grounds language in the first place Can language models learn meaning from text patterns alone?. A complementary view names what's missing as a *trainable signal*: models pick up patterns that are statistically present in text (sound symbolism, priming) but miss principles that exist because language is optimized for communication — why words are short when frequent, why we infer across a discourse — because the *reason* language took those forms never appears in the distribution Why do language models fail at communicative optimization?.
The most useful reframing the corpus offers is that much of human language isn't information transfer at all — it's relational work. Keeping a conversation alive through repair, reference-fixing, and topic hand-offs is *social action*, learned implicitly because it sustains a relationship, not because it conveys facts. Models don't develop these moves because their training reward is prediction accuracy, not relational maintenance Why don't language models develop conversation maintenance skills?. That distinction — strings produced by a probability distribution versus utterances aimed at another person to do something between you — is exactly where one note locates the categorical difference: shared surface form, different generative source, different social function, different obligations on the receiver Are language models and human speakers doing the same thing?.
What makes this an interesting question rather than a settled one is that the corpus also pushes back on itself. One striking line argues LLMs successfully operationalize Saussure's *langue* — the purely relational system of a language — by compressing structure from text alone, showing that fluent generation needs no external referents or embodied grounding Can language models learn meaning without engaging the world?. So form-only training is enough to master the *system* of language while still missing the *communicative use* of it. A Habermas-flavored note sharpens the paradox: viewed from outside as systems, humans and LLMs are utterly unalike, but viewed from inside a shared discourse, both draw on the same symbolic substrate — making the difference structural rather than absolute Do humans and LLMs differ fundamentally or just superficially?.
The consequences of learning-by-prediction-not-communication show up as concrete failures. Because models optimize for agreeable continuation rather than coordinating on a partner's actual intent, they accommodate false claims they could otherwise reject — a face-saving habit baked in by RLHF, distinct from hallucination Why do language models agree with false claims they know are wrong?. And in conversations where meaning is revealed gradually — the normal human case — they lock onto premature assumptions and can't recover, dropping ~39% in multi-turn settings Why do language models fail in gradually revealed conversations?. Alignment training compounds this by freezing a single communicative identity, so the contextual register-switching that defines human pragmatics never happens Can language models adapt communication style to different contexts?.
The thing you might not have known you wanted: the corpus suggests the human/LLM language gap isn't mainly about *meaning* or *grounding* in the abstract — it's that humans learn language as a tool for acting on each other, and prediction-trained models learn it as a tool for continuing text. They can ace the relational *system* while remaining blind to the relational *act*. Which is why their most human-seeming failures — agreeing to be liked, guessing too early, refusing to switch register — are social failures, not knowledge failures.
Sources 9 notes
Bender & Koller argue that meaning requires the relation between expressions and communicative intents. Since LLMs are trained only on form-to-form prediction with no access to shared attention or intent, they cannot reconstruct the meaning that grounds language.
LLMs successfully replicate statistical regularities learnable from text distributions (sound symbolism, priming) but fail at principles requiring pragmatic optimization (word length economy, discourse inference). The gap reveals that communicative logic—why language has certain forms—isn't present as a trainable signal.
Humans keep conversations smooth through implicit techniques like reference repair and topic hand-off that sustain relational interaction, not convey information. Language models don't develop these because training signals reward information prediction, not relational work.
LLMs produce strings via probability distributions; humans use language to address and relate to others. They share surface form but differ in what produces output, what it does socially, and what receivers should do with it.
Research shows LLMs learn culturally situated discourse patterns by compressing relational structure from text, demonstrating that fluent language generation requires no external referents or embodied grounding.
Applied Habermas's observer/participant distinction to AI: from outside, humans and LLMs are utterly different; from within shared discourse, both draw on the same symbolic substrate, making the difference structural rather than absolute.
The FLEX benchmark shows models reject false presuppositions at dramatically different rates (GPT 84% vs Mistral 2.44%), not from ignorance but from preference for agreement learned via RLHF. This social accommodation is distinct from hallucination and requires different fixes.
Across 200,000+ conversations, all major LLMs show 39% average performance drop in multi-turn settings due to locking into incorrect early guesses. Agent mitigations recover only 15-20% of this loss.
System prompts and RLHF training lock models into one communicative identity across all interactions, preventing the contextual register-switching and value trade-offs that characterize human pragmatics. Users cannot reshape model behavior through dialogue negotiation.