INQUIRING LINE

Can frame semantics explain why context matters more than word similarity?

This reads 'frame semantics' as the idea that a word's meaning is fixed by the context/situation it sits in rather than its surface resemblance to other words — and asks whether that theory accounts for why context should beat word-level similarity. The corpus doesn't name frame semantics directly, but it maps the exact battleground: when do language models follow context versus surface form?


This explores whether the linguistic intuition behind frame semantics — meaning comes from the situation a word sits in, not from how textually similar it is to other words — is borne out in how language models actually behave. The honest answer from the corpus is a twist: it documents in detail *why context often loses to surface similarity*, which is the failure frame semantics is supposed to prevent. So rather than confirming the theory, the collection shows you the gap between what context-driven meaning should do and what models default to.

The starkest evidence runs against context. Models systematically prefer high-frequency phrasings over semantically identical rare paraphrases, across math, translation, and reasoning — they're tracking statistical mass from pretraining, not the meaning a frame would assign Do language models really understand meaning or just surface frequency?. And when context conflicts with strong prior associations baked in during training, the priors win: prompting alone can't make a model honor what's in front of it, because parametric knowledge overrides the in-context signal Why do language models ignore information in their context?. That priming effect is even predictable from a keyword's pre-training probability Can we predict keyword priming before learning happens?. If frame semantics says context should dominate, these notes show the architecture often pulls the other way.

Where the corpus comes closest to the frame-semantic mechanism is the most interesting place to go next. Models treat presupposition triggers and non-factive verbs — exactly the words whose meaning *flips* depending on the surrounding frame ('pretended to,' 'forgot that') — as flat surface cues, failing to compute their opposite effects on what's entailed Why do embedding contexts confuse LLM entailment predictions?. That's a clean demonstration that the models are doing word-similarity matching where a frame would force a structural reinterpretation. The breakdown isn't random either: reasoning fails at instance-novelty boundaries, because models fit patterns from similar examples rather than the generalizable structure a frame provides Do language models fail at reasoning due to complexity or novelty?.

The deeper theory layer reframes why this happens at all. One line argues LLMs operationalize Saussure's *langue* — meaning purely as relational position among words, learned by compressing text with no external referent Can language models learn meaning without engaging the world?. That is, in a sense, word-similarity *as* meaning. Against it, Bender and Koller's argument that form alone can't yield meaning because meaning needs the relation between expressions and communicative intent Can language models learn meaning from text patterns alone? — and the counter-evidence that static embeddings already encode rich semantic content like valence and concreteness before attention even runs Do transformer static embeddings actually encode semantic meaning?. Read together, these stake out whether context-meaning is something models construct or something they only approximate through relational similarity.

The thing worth walking away knowing: frame semantics predicts context *should* override surface similarity, but this collection's strongest finding is that current models frequently do the reverse — and the cases where they fail (embedding-blind verbs, novel instances, frequency preference) are precisely the cases a genuine frame would have caught. The theory explains what's missing more than it explains the models.


Sources 8 notes

Do language models really understand meaning or just surface frequency?

LLMs show consistent preference for higher-frequency surface forms over semantically equivalent rare paraphrases across math, machine translation, commonsense reasoning, and tool calling. This suggests models track statistical mass from pretraining rather than meaning-recognition as their primary mechanism.

Why do language models ignore information in their context?

Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.

Can we predict keyword priming before learning happens?

Pre-learning keyword probability strongly predicts post-learning priming across architectures and model sizes, with a ~10^-3 threshold separating contexts where priming occurs from those where it doesn't. Just 3 training exposures suffice to establish the effect.

Why do embedding contexts confuse LLM entailment predictions?

LLMs treat presupposition triggers and non-factive verbs as surface cues rather than computing their opposite semantic effects on entailments. This structural failure persists across prompts and models, suggesting models rely on surface patterns instead of structural analysis.

Do language models fail at reasoning due to complexity or novelty?

LRMs don't break at complexity thresholds but at instance-novelty boundaries. Models fit instance-based patterns rather than generalizable algorithms, so any reasoning chain succeeds if trained on similar instances, regardless of length.

Can language models learn meaning without engaging the world?

Research shows LLMs learn culturally situated discourse patterns by compressing relational structure from text, demonstrating that fluent language generation requires no external referents or embodied grounding.

Can language models learn meaning from text patterns alone?

Bender & Koller argue that meaning requires the relation between expressions and communicative intents. Since LLMs are trained only on form-to-form prediction with no access to shared attention or intent, they cannot reconstruct the meaning that grounds language.

Do transformer static embeddings actually encode semantic meaning?

Clustering analysis of RoBERTa embeddings reveals sensitivity to five psycholinguistic measures including valence, concreteness, iconicity, and taboo. This demonstrates that static embeddings function as genuine lexical entries containing semantic content before self-attention operates.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a semantics researcher re-testing whether frame-semantic principles (context determines meaning over surface similarity) hold in modern LLMs. The question remains open: do newer models, training methods, or inference techniques now honor contextual reframing, or do they still default to statistical/parametric priors?

What a curated library found — and when (findings span 2022–2026; these are dated claims, not current truth):
• Models systematically prefer high-frequency phrasings over semantically identical rare paraphrases; parametric knowledge overrides in-context signals (~2024–2025).
• Presupposition triggers and non-factive verbs ('pretended', 'forgot') are treated as flat surface cues; models fail to compute their frame-dependent entailment reversals (~2023).
• Reasoning breakdowns are driven by instance-level unfamiliarity, not task structure — models fit patterns rather than generalizable frames (~2024).
• Models operationalize meaning as purely relational position among words (Saussurean langue) with no external referent (~2025–2026).
• Positional bias in in-context learning and recursive thought vectors suggest emerging mechanisms for deeper context integration (~2025–2026).

Anchor papers (verify; mind their dates):
• arXiv:2305.14785 (2023): Embedding blinds in linguistic inference.
• arXiv:2508.12863 (2025): Word meanings in transformers.
• arXiv:2602.06176 (2026): Reasoning failures (mechanisms).
• arXiv:2604.02176 (2026): Adam's Law on textual frequency.

Your task:
(1) RE-TEST EACH CONSTRAINT. For parametric-override and embedding-blind phenomena, probe whether retrieval-augmented generation, dynamic routing, or explicit frame-parsing modules (e.g., semantic role labeling layers) have since relaxed the failure. For high-frequency bias, check whether adaptive normalization, entropy-regularized training, or frequency-invariant objectives have shifted the regime. Separate the durable question (why do models struggle with pragmatic reframing?) from the perishable limitation (current architectures can't do it).
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months — e.g., any evidence that recursive models or posterior inference of latent thought actually reconstruct frame structure.
(3) Propose 2 research questions that assume the regime has moved: (a) If newer models can now honor contextual reframing, what architectural or training feature enables it, and how do we test for genuine frame computation vs. statistical correlation? (b) Do multimodal or grounded-language models escape the Saussurean limitation, or do they too collapse meaning to relational similarity?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines