Can language models learn meaning from text patterns alone?
Explores whether training on form alone—predicting the next word from prior words—could ever give language models access to communicative intent and genuine semantic understanding.
Bender & Koller (2020) make a specific structural argument, not just an intuitive one. Meaning is defined as the relation M ⊆ E × I — pairs of natural language expressions and the communicative intents they can be used to evoke. Understanding language means retrieving i given e. But communicative intents are about something outside of language. Form alone — marks on a page, pixels, bytes — is insufficient.
The reasoning: without access to a mechanism for hypothesizing and testing underlying communicative intents, reconstructing them from form alone is impossible. Language modeling predicts the next token given prior tokens — purely a form-to-form operation. The training signal provides no information about what intents the forms were used to evoke.
Human language acquisition illustrates the point by contrast. What is critical for meaning acquisition is not just interaction but joint attention — situations where child and caregiver both attend to the same thing and are both aware of this fact. Learning meaning requires the ability to be aware of what another person is attending to and guess what they are intending to communicate. Intersubjectivity is not incidental to language learning; it is its mechanism.
The Harnad formulation (symbol grounding problem): a non-speaker of Chinese cannot learn the meanings of Chinese words from Chinese dictionary definitions alone. You need something outside the symbol system to anchor the symbols. Form-to-form prediction cannot provide this anchor.
Mutual understanding is structurally unavailable — even in conversational media. The form-only training constraint has a downstream consequence that applies even when AI operates in conversational channels: seeking mutual understanding with the user is structurally unavailable to an LLM because mutual understanding requires the intersubjectivity that form-training cannot provide. The communication is one-way even when it occurs on a medium designed for mediated social interaction. This reframes AI social-media posts as a specific genre: indirect discourse that is a form of writing even when it appears in an interactive environment. The user reads the post, the medium formally supports reply, but the AI is not available for the second turn that would close a loop of mutual understanding — and was never going to be. The channel looks communicative; the content is monological writing that happens to be deposited in a conversational shape.
This is distinct from the claim that LLMs "have no understanding." It is the more precise claim that the training mechanism — string prediction — is in principle incapable of providing the signal that meaning acquisition requires, regardless of scale.
Inquiring lines that use this note as a source 58
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- How does training data preserve communicative event structure without the actual events?
- Why does frame-activation matter more than word-by-word composition?
- How do readers selectively hold frame-related words in mind?
- Why does training data saliency distort how models judge meaning?
- How do humans learn language through communication differently than LLM text prediction?
- What is the difference between learning discourse patterns and learning abstract language?
- Does functional grounding through discourse patterns count as genuine semantic meaning?
- What makes internal embeddings useful as multimodal input for language model training?
- Do language models learn surface patterns instead of underlying linguistic principles?
- Can implicit linguistic information ever be reliably learned from training data?
- Can language models learn to form ad-hoc conventions through training?
- Can language models reason without relying on learned semantic patterns?
- Why do autoregressive models fail at controlling syntactic structure and semantic content?
- How does monological training on text differ from dialogical training in conversation?
- What training on actual interaction would show that text-only training cannot?
- Do language models learn surface patterns that appear generalizable but actually fail under shift?
- Can large language models understand language without embodied grounding systems?
- Does next-token prediction alone produce genuine functional language competence?
- Can speech embeddings carry articulatory structure that text cannot?
- Can language models acquire meaning from distributional patterns alone without joint attention?
- Why does pure-vision underperform when parsing semantics and action prediction mix?
- Why do language models infer political orientation from seemingly innocuous user signals?
- Can correct model outputs prove that semantic meaning rather than surface patterns drove the response?
- Can frame semantics explain why context matters more than word similarity?
- Can AI learn to perform attention-seeking surface forms with genuine internal appeal?
- Why do traditional interfaces bypass the intention formation problem that language models expose?
- Can conversational AI achieve mutual understanding if trained only on text?
- Can language meaning emerge without joint attention and shared embodied interaction?
- What distinguishes surface cues from structural meaning in language understanding?
- What communicative optimization principles do language models fail to acquire?
- Can language models develop world models that ground meaning in causal reality?
- Can understanding language happen entirely within a language system alone?
- Do language models actually learn linguistic structure or just surface statistics?
- Can static word-sharing create genuine communicative grounding between humans and models?
- Do language models encode deep syntactic structure or only surface-level patterns?
- Does the prediction unit shape what language models actually learn?
- Can next-token prediction train models to optimize for communication efficiency?
- Can formal language pretraining address surface generalization without learning true linguistic structure?
- Does chain-of-thought prompting overcome implicit meaning deficits in text analysis?
- How do language models transmit traits through semantically unrelated data?
- Can language models generate plausible latent thoughts without human annotation?
- Can text generation be meaningfully called communication without mutual orientation?
- Can statistical learning from text replace embodied cultural experience?
- How do pretrained language models represent inferential patterns versus lexical and positional cues?
- Do language models and multimodal models show similar attractor-based interpretability?
- Can external actions provide causal necessity that language models lack?
- Do newer language models diverge further from human lexical patterns?
- Why do newer AI models diverge further from human text patterns?
- Can pragmatic competence emerge from text exposure alone without interactive grounding?
- Can training on text corpora teach what communicative acts produce?
- What distinguishes surface language form from communicative operation?
- Can decoder-only models become effective text encoders with training?
- Does language convey meaning purely through relational structure without external grounding?
- Can autoformalisation from natural language preserve semantic accuracy?
- Do language models need words to think or just latent structure?
- Can readers detect meaning through resonance patterns alone without knowing authorial intent?
- What does next-token prediction tell us about compositional linguistic competence?
- How do semantic features in representations become steerable task-specific directions?
Related concepts in this collection 3
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Do LLMs develop the same kind of mind as humans?
Explores whether LLMs and humans share the intersubjective linguistic training that shapes cognition, and whether that shared training produces equivalent forms of agency and reflexivity.
Habermas framing of the same gap from different angle: shared substrate, absent participatory mechanism
-
What makes linguistic agency impossible for language models?
From an enactive perspective, does linguistic agency require embodied participation and real stakes that LLMs fundamentally lack? This matters because it challenges whether LLMs can truly engage in language or only generate text.
enactive cognitive science version of the same absence
-
Can models pass tests while missing the actual grammar?
Do language models succeed on grammatical benchmarks by learning surface patterns rather than structural rules? This matters because correct outputs may hide reliance on shallow heuristics that fail on novel structures.
what is learned from form alone: surface regularities, not structural competence
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data
- Word Meanings in Transformer Language Models
- Computational structuralism: Toward a formal theory of meaning in the age of digital intelligence
- Language models show human-like content effects on reasoning tasks
- Semantic Structure in Large Language Model Embeddings
- Mechanistic Indicators of Understanding in Large Language Models
- CoT is Not True Reasoning, It Is Just a Tight Constraint to Imitate: A Theory Perspective
- Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks
Original note title
language models trained on form alone cannot acquire meaning because meaning requires joint attention and intersubjectivity