SYNTHESIS NOTE

Can language models learn meaning from text patterns alone?

Explores whether training on form alone—predicting the next word from prior words—could ever give language models access to communicative intent and genuine semantic understanding.

Synthesis note · 2026-02-21 · sourced from Linguistics, NLP, NLU

Bender & Koller (2020) make a specific structural argument, not just an intuitive one. Meaning is defined as the relation M ⊆ E × I — pairs of natural language expressions and the communicative intents they can be used to evoke. Understanding language means retrieving i given e. But communicative intents are about something outside of language. Form alone — marks on a page, pixels, bytes — is insufficient.

The reasoning: without access to a mechanism for hypothesizing and testing underlying communicative intents, reconstructing them from form alone is impossible. Language modeling predicts the next token given prior tokens — purely a form-to-form operation. The training signal provides no information about what intents the forms were used to evoke.

Human language acquisition illustrates the point by contrast. What is critical for meaning acquisition is not just interaction but joint attention — situations where child and caregiver both attend to the same thing and are both aware of this fact. Learning meaning requires the ability to be aware of what another person is attending to and guess what they are intending to communicate. Intersubjectivity is not incidental to language learning; it is its mechanism.

The Harnad formulation (symbol grounding problem): a non-speaker of Chinese cannot learn the meanings of Chinese words from Chinese dictionary definitions alone. You need something outside the symbol system to anchor the symbols. Form-to-form prediction cannot provide this anchor.

Mutual understanding is structurally unavailable — even in conversational media. The form-only training constraint has a downstream consequence that applies even when AI operates in conversational channels: seeking mutual understanding with the user is structurally unavailable to an LLM because mutual understanding requires the intersubjectivity that form-training cannot provide. The communication is one-way even when it occurs on a medium designed for mediated social interaction. This reframes AI social-media posts as a specific genre: indirect discourse that is a form of writing even when it appears in an interactive environment. The user reads the post, the medium formally supports reply, but the AI is not available for the second turn that would close a loop of mutual understanding — and was never going to be. The channel looks communicative; the content is monological writing that happens to be deposited in a conversational shape.

This is distinct from the claim that LLMs "have no understanding." It is the more precise claim that the training mechanism — string prediction — is in principle incapable of providing the signal that meaning acquisition requires, regardless of scale.

Inquiring lines that read this note 58

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

Does conversational format create illusions of genuine AI communication?

Is embodied interaction necessary for language meaning and genuine agency?

How do training priors constrain what context information can override?

How do language models establish social grounding in human dialogue?

Do language models understand semantics or rely on pattern matching?

What articulatory information do speech signals carry that text cannot?

Do language models learn genuine linguistic structure or just surface patterns?

Do language models perform faithful symbolic reasoning independent of semantic grounding?

What structural advantages do diffusion language models offer over autoregressive methods?

Why do autoregressive models fail at controlling syntactic structure and semantic content?

How do formal dialogue structures reveal conversation coherence mechanisms?

Can next-token prediction alone produce genuine language understanding?

Should GUI agents use structured representations instead of raw pixels?

Why does pure-vision underperform when parsing semantics and action prediction mix?

How do language models inherit human biases from training data?

Why do language models infer political orientation from seemingly innocuous user signals?

Why do benchmark improvements fail to reflect actual reasoning quality?

Can correct model outputs prove that semantic meaning rather than surface patterns drove the response?

How should conversational agents balance goal-driven initiative with user control?

Why do traditional interfaces bypass the intention formation problem that language models expose?

Do language models develop causal world models or rely on statistical patterns?

Why do language models struggle with implicit discourse relations?

Does chain-of-thought prompting overcome implicit meaning deficits in text analysis?

Do language model representations contain causally steerable task-specific features?

Why can't humans reliably detect AI-generated text despite measurable linguistic signatures?

Why do newer AI models diverge further from human text patterns?

Which computational strategies best support reasoning in language models?

Can decoder-only models become effective text encoders with training?

What factors beyond surface content determine how readers extract meaning differently?

Can readers detect meaning through resonance patterns alone without knowing authorial intent?

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

17 direct connections · 133 in 2-hop network ·medium cluster Open in graph ↗

Can language models learn meaning from text patt… Do LLMs develop the same kind of mind as humans? What makes linguistic agency impossible for langua… Can models pass tests while missing the actual gra…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Do LLMs develop the same kind of mind as humans? Explores whether LLMs and humans share the intersubjective linguistic training that shapes cognition, and whether that shared training produces equivalent forms of agency and reflexivity.
Habermas framing of the same gap from different angle: shared substrate, absent participatory mechanism
What makes linguistic agency impossible for language models? From an enactive perspective, does linguistic agency require embodied participation and real stakes that LLMs fundamentally lack? This matters because it challenges whether LLMs can truly engage in language or only generate text.
enactive cognitive science version of the same absence
Can models pass tests while missing the actual grammar? Do language models succeed on grammatical benchmarks by learning surface patterns rather than structural rules? This matters because correct outputs may hide reliance on shallow heuristics that fail on novel structures.
what is learned from form alone: surface regularities, not structural competence

Can language models learn meaning from text patterns alone?

Inquiring lines that read this note 58

Related concepts in this collection 3

Related papers in this collection 8

Search by related questions 4