SYNTHESIS NOTE

Why do language models fail at communicative optimization?

LLMs excel at learning surface statistical patterns from text but struggle with deeper principles of how language achieves efficient communication. What distinguishes these two types of linguistic knowledge?

Synthesis note · 2026-02-21 · sourced from Linguistics, NLP, NLU

"Do Large Language Models Resemble Humans in Language Use?" (Yiu et al. 2023) evaluates LLMs on a wide range of human linguistic regularities — not just grammaticality but psycholinguistic phenomena. The results show a consistent pattern of success and failure that tracks a specific distinction.

LLMs succeed on:

Sound symbolism (maluma/takete roundness-spikiness associations)
Sound-gender associations
Structural priming (using the same syntactic structure as prime)
Semantic priming (accessing recently primed word meanings)
Dialect sensitivity (accessing dialect-appropriate vocabulary based on stated interlocutor identity)

These regularities are learnable from distributional patterns in text — they appear consistently across large corpora and can be acquired through form-to-form prediction.

LLMs fail on:

Word length economy (choosing shorter forms when context is predictive — the Zipfian efficiency principle)
Syntactic ambiguity resolution (selecting the contextually appropriate reading of ambiguous syntax)
Semantic illusions (detecting incongruent words in otherwise coherent sentences)
Drawing discourse inferences (bridging two pieces of information)

These regularities require something beyond distributional pattern matching. They involve principles of why language works for communication — efficiency under communicative pressure, contextual interpretation that goes beyond local statistics, integration across discourse.

The discriminating principle: statistical regularities that appear as consistent patterns in training data transfer. Regularities that emerge from communicative optimization — the pragmatic logic of why language has the forms it does — do not transfer, because they are not present in surface form as trainable signals.

Inquiring lines that read this note 16

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

How do language models establish social grounding in human dialogue?

Do language models understand semantics or rely on pattern matching?

How faithfully do LLMs reflect their actual reasoning in outputs and explanations?

Do language models perform faithful symbolic reasoning independent of semantic grounding?

Do LLMs have functional linguistic competence or only formal language ability?

Do language models learn genuine linguistic structure or just surface patterns?

What critical LLM failures do standard benchmarks hide?

Is embodied interaction necessary for language meaning and genuine agency?

What distinguishes surface language form from communicative operation?

How do standardized protocols improve coordination in multi-agent systems?

What distinguishes communicative acts from operational actions in agentic LLMs?

How can LLM recommenders match or exceed collaborative filtering performance?

Why do LLMs rely on content knowledge instead of collaborative signals?

Related concepts in this collection 5

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

19 direct connections · 144 in 2-hop network ·medium cluster Open in graph ↗

Why do language models fail at communicative opt… Can models pass tests while missing the actual gra… Why does ChatGPT fail at implicit discourse relati… Why do LLMs handle causal reasoning better than te… Why do speakers deliberately use ambiguous languag… Why don't LLMs shorten messages like humans do?

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Can models pass tests while missing the actual grammar? Do language models succeed on grammatical benchmarks by learning surface patterns rather than structural rules? This matters because correct outputs may hide reliance on shallow heuristics that fail on novel structures.
this paper provides the specific taxonomy of which generalizations do and don't transfer
Why does ChatGPT fail at implicit discourse relations? ChatGPT excels when discourse connectives are present but drops to 24% accuracy without them. What does this gap reveal about how LLMs actually process meaning and logical relationships?
same pattern: surface-explicit success, contextual-implicit failure
Why do LLMs handle causal reasoning better than temporal reasoning? Exploring whether language models perform asymmetrically on different discourse relations and what training data patterns might explain the gap between causal and temporal reasoning abilities.
parallel: what's in the training distribution determines what's learned
Why do speakers deliberately use ambiguous language? Explores whether ambiguity is a linguistic defect or a strategic tool speakers use for efficiency, politeness, and deniability. Matters because it challenges how we train language systems.
ambiguity management is one of the failed communicative optimization principles
Why don't LLMs shorten messages like humans do? Humans naturally develop shorter, efficient language during conversations. Do multimodal LLMs exhibit this same spontaneous adaptation, or do they lack this communicative behavior?
ICCA finding: convention formation (becoming more efficient through interaction) is precisely a communicative optimization principle that LLMs fail to acquire; they understand efficient language but don't produce it

Why do language models fail at communicative optimization?

Inquiring lines that read this note 16

Related concepts in this collection 5

Related papers in this collection 8

Search by related questions 4