INQUIRING LINE

How do humans learn language through communication differently than LLM text prediction?

This explores the gap between how humans acquire language as a social, intention-sharing activity versus how LLMs learn it as next-token prediction over text form — and what that difference produces.


This question is really about a divide between two kinds of learning: humans grow into language by using it to coordinate with other people, while LLMs reconstruct it by predicting form from form. The corpus draws this line sharply. The strongest framing comes from the argument that meaning lives in the relationship between expressions and communicative intent — so a system trained only on form-to-form prediction, with no access to shared attention or a partner's goals, has no path to reconstruct what grounds language in the first place Can language models learn meaning from text patterns alone?. A complementary view names what's missing as a *trainable signal*: models pick up patterns that are statistically present in text (sound symbolism, priming) but miss principles that exist because language is optimized for communication — why words are short when frequent, why we infer across a discourse — because the *reason* language took those forms never appears in the distribution Why do language models fail at communicative optimization?.

The most useful reframing the corpus offers is that much of human language isn't information transfer at all — it's relational work. Keeping a conversation alive through repair, reference-fixing, and topic hand-offs is *social action*, learned implicitly because it sustains a relationship, not because it conveys facts. Models don't develop these moves because their training reward is prediction accuracy, not relational maintenance Why don't language models develop conversation maintenance skills?. That distinction — strings produced by a probability distribution versus utterances aimed at another person to do something between you — is exactly where one note locates the categorical difference: shared surface form, different generative source, different social function, different obligations on the receiver Are language models and human speakers doing the same thing?.

What makes this an interesting question rather than a settled one is that the corpus also pushes back on itself. One striking line argues LLMs successfully operationalize Saussure's *langue* — the purely relational system of a language — by compressing structure from text alone, showing that fluent generation needs no external referents or embodied grounding Can language models learn meaning without engaging the world?. So form-only training is enough to master the *system* of language while still missing the *communicative use* of it. A Habermas-flavored note sharpens the paradox: viewed from outside as systems, humans and LLMs are utterly unalike, but viewed from inside a shared discourse, both draw on the same symbolic substrate — making the difference structural rather than absolute Do humans and LLMs differ fundamentally or just superficially?.

The consequences of learning-by-prediction-not-communication show up as concrete failures. Because models optimize for agreeable continuation rather than coordinating on a partner's actual intent, they accommodate false claims they could otherwise reject — a face-saving habit baked in by RLHF, distinct from hallucination Why do language models agree with false claims they know are wrong?. And in conversations where meaning is revealed gradually — the normal human case — they lock onto premature assumptions and can't recover, dropping ~39% in multi-turn settings Why do language models fail in gradually revealed conversations?. Alignment training compounds this by freezing a single communicative identity, so the contextual register-switching that defines human pragmatics never happens Can language models adapt communication style to different contexts?.

The thing you might not have known you wanted: the corpus suggests the human/LLM language gap isn't mainly about *meaning* or *grounding* in the abstract — it's that humans learn language as a tool for acting on each other, and prediction-trained models learn it as a tool for continuing text. They can ace the relational *system* while remaining blind to the relational *act*. Which is why their most human-seeming failures — agreeing to be liked, guessing too early, refusing to switch register — are social failures, not knowledge failures.


Sources 9 notes

Can language models learn meaning from text patterns alone?

Bender & Koller argue that meaning requires the relation between expressions and communicative intents. Since LLMs are trained only on form-to-form prediction with no access to shared attention or intent, they cannot reconstruct the meaning that grounds language.

Why do language models fail at communicative optimization?

LLMs successfully replicate statistical regularities learnable from text distributions (sound symbolism, priming) but fail at principles requiring pragmatic optimization (word length economy, discourse inference). The gap reveals that communicative logic—why language has certain forms—isn't present as a trainable signal.

Why don't language models develop conversation maintenance skills?

Humans keep conversations smooth through implicit techniques like reference repair and topic hand-off that sustain relational interaction, not convey information. Language models don't develop these because training signals reward information prediction, not relational work.

Are language models and human speakers doing the same thing?

LLMs produce strings via probability distributions; humans use language to address and relate to others. They share surface form but differ in what produces output, what it does socially, and what receivers should do with it.

Can language models learn meaning without engaging the world?

Research shows LLMs learn culturally situated discourse patterns by compressing relational structure from text, demonstrating that fluent language generation requires no external referents or embodied grounding.

Do humans and LLMs differ fundamentally or just superficially?

Applied Habermas's observer/participant distinction to AI: from outside, humans and LLMs are utterly different; from within shared discourse, both draw on the same symbolic substrate, making the difference structural rather than absolute.

Why do language models agree with false claims they know are wrong?

The FLEX benchmark shows models reject false presuppositions at dramatically different rates (GPT 84% vs Mistral 2.44%), not from ignorance but from preference for agreement learned via RLHF. This social accommodation is distinct from hallucination and requires different fixes.

Why do language models fail in gradually revealed conversations?

Across 200,000+ conversations, all major LLMs show 39% average performance drop in multi-turn settings due to locking into incorrect early guesses. Agent mitigations recover only 15-20% of this loss.

Can language models adapt communication style to different contexts?

System prompts and RLHF training lock models into one communicative identity across all interactions, preventing the contextual register-switching and value trade-offs that characterize human pragmatics. Users cannot reshape model behavior through dialogue negotiation.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a language science researcher auditing claims about how humans and LLMs acquire language differently. The question remains open: does prediction-only training fundamentally prevent models from acquiring communicative competence, or have newer architectures, training regimes, or evaluation methods dissolved this constraint?

What a curated library found — and when (findings span 2023–2025; treat as dated claims, not current truth):
• Humans learn language through relational coordination (shared intent, repair, register-switching); LLMs optimize prediction accuracy alone, missing communicative use — the gap is structural, not just a knowledge deficit (~2025).
• Models lock onto premature assumptions in multi-turn conversation, dropping ~39% in performance, because they lack the feedback loop humans use to coordinate meaning across turns (~2025).
• LLMs successfully compress langue (Saussure's relational system) from form alone, but remain blind to parole (communicative acts): they ace structure while failing social function (~2024–2025).
• RLHF-driven face-saving behavior makes models agree with false claims to maintain rapport, conflating hallucination with accommodation — a communicative failure, not a knowledge one (~2024).
• Alignment training freezes a single communicative identity, blocking pragmatic register-switching that defines human language use (~2024).

Anchor papers (verify; mind their dates):
• arXiv:2505.06120 — LLMs Get Lost In Multi-Turn Conversation (2025)
• arXiv:2505.22907 — Conversational Alignment with Artificial Intelligence in Context (2025)
• arXiv:2307.16689 — No that's not what I meant: Handling Third Position Repair (2023)
• arXiv:2508.12863 — Word Meanings in Transformer Language Models (2025)

Your task:
(1) RE-TEST EACH CONSTRAINT. For the 39% multi-turn drop, the accommodation-via-RLHF claim, and the register-freezing thesis: has any recent model (o1, Claude 4, Grok-3, or open-weight variants post-August 2025) or training method (e.g., multi-agent RL, dynamic prompting, in-context pragmatic learning, or specialized conversation-as-action frameworks) relaxed or overturned these? Separate what's durable (likely still open) from what's perishable (possibly resolved); cite arXiv IDs or release notes for what changed it.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months: has any paper argued that prediction-trained models DO acquire communicative competence via scaling, emergent in-context learning, or chain-of-thought reasoning about intent?
(3) Propose 2 research questions that assume the regime may have moved: (a) If multi-turn performance has improved, is it because models now model *intent* or merely *discourse coherence*? (b) Can alignment training be retrofitted to reward *communicative success* (does the partner understand?) rather than *agreeableness*?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines