What is the difference between learning discourse patterns and learning abstract language?
This explores what LLMs actually pick up from training text — the social patterns of who-says-what-when (discourse), versus the rules of language stripped of context (abstract grammar/meaning).
This explores the gap between learning discourse patterns — the socially situated business of which speakers say which things in which situations — and learning language in the abstract: grammar, meaning, and the principles that govern form. The corpus comes down hard on one side. LLMs trained on web text acquire culturally situated discourse, not abstract language Do language models learn abstract grammar or cultural speech patterns?. That single shift reframes a lot: when a model reproduces a persona or a social position, it isn't applying grammar, it's replaying the patterned speech of a community.
Why can't it cross over into the abstract? Because the abstract layer isn't actually present in the training signal. Bender & Koller's argument runs through the corpus: meaning lives in the relation between expressions and communicative intent, and a model trained on form-to-form prediction never sees intent, so it can't reconstruct meaning from form alone Can language models learn meaning from text patterns alone?. A complementary finding sharpens it — models happily absorb statistical regularities you can read off the text surface (priming, sound symbolism) but miss the communicative *principles* that explain why language has the forms it does, like word-length economy, because the 'why' was never a trainable signal Why do language models fail at communicative optimization?.
The same split shows up in reasoning, which is where it gets interesting. Chain-of-thought looks like abstract inference but is constrained imitation of reasoning *form* — familiar schemata replayed, degrading the moment the distribution shifts Does chain-of-thought reasoning reveal genuine inference or pattern matching?. And LLMs handle causal reasoning better than temporal reasoning precisely because causal connectives are explicit and frequent in text, while temporal order is implicit and must be inferred Why do LLMs handle causal reasoning better than temporal reasoning?. In both cases the model is strong where the pattern is on the surface of discourse and weak where it would have to abstract.
Here's the twist you might not expect: abstraction *can* be taught — but only when you make it concrete in the text. Training on formal-language prototypes like Prolog and PDDL improved cross-domain reasoning, because those representations put the abstract structure on the page where the surface-pattern learner can grab it Do formal language prototypes improve reasoning across different domains?. The same logic explains why argument-scheme classification stalls: it demands recognizing inferential patterns spread across distributed text spans rather than local surface features, and models plateau exactly there Why does argument scheme classification stumble where other NLP tasks succeed?. The pattern across the whole corpus is consistent — what reads as 'abstract language' is really just discourse structure that happens to be explicit enough to imitate.
Sources 7 notes
LLMs trained on web text acquire socially contextualized linguistic action—which speakers make which statements in response to which situations. They model cultural discourse rather than language in the abstract sense, which explains why they reproduce social positions and personas.
Bender & Koller argue that meaning requires the relation between expressions and communicative intents. Since LLMs are trained only on form-to-form prediction with no access to shared attention or intent, they cannot reconstruct the meaning that grounds language.
LLMs successfully replicate statistical regularities learnable from text distributions (sound symbolism, priming) but fail at principles requiring pragmatic optimization (word length economy, discourse inference). The gap reveals that communicative logic—why language has certain forms—isn't present as a trainable signal.
CoT works by constraining models to reproduce familiar reasoning patterns from training, not by enabling novel symbolic reasoning. Performance degrades predictably under distribution shifts—the signature of imitation rather than capability emergence.
ChatGPT excels at causal relations but struggles with temporal ordering because causal connectives are explicit and frequent in training data, while temporal order is often implicit and must be inferred contextually.
Training on Prolog and PDDL representations improved logical reasoning by 4.7%, planning by 6.3%, and general reasoning by 4.0%. Models exposed to prototype languages generalized better to structurally similar problems than natural language-only training.
Scheme classification requires recognizing inferential patterns across distributed text spans, not local surface features. Models plateau at F1 0.55–0.65 while the same systems exceed 0.80 on component tagging and stance, suggesting the integrative reasoning demand is fundamentally different.