Do metaphors work by decoupling meaning from linguistic associations?
This explores whether understanding a metaphor requires pulling meaning *away* from the statistical word-associations a model learned from text — and what the corpus says about whether LLMs can actually do that.
This explores whether metaphors work by decoupling meaning from linguistic associations — and the corpus suggests the question quietly contains its own answer: metaphor comprehension is hard for LLMs precisely *because* they can't easily decouple meaning from the word-associations they absorbed in training. One line of work reframes metaphors, idioms, and puns as a single task — recovering a literal meaning hiding inside a non-literal expression — and argues the missing ingredient isn't more examples but better "semantic decoupling" (Can one model handle all types of figurative language?). So 'decoupling' isn't a side effect of metaphor; on this view it's the core operation.
Why is that operation hard? Several notes point at the same mechanism from different angles. LLMs systematically prefer high-frequency surface phrasings over rarer but equivalent ones, tracking statistical mass from pretraining rather than meaning itself (Do language models really understand meaning or just surface frequency?). And when semantic content is stripped out of a reasoning task, performance collapses even with correct rules supplied — the models lean on token associations, not abstract structure (Do large language models reason symbolically or semantically?). A metaphor asks for the opposite move: ignore the literal associations of 'time' and 'money' and map an abstract relation across domains. That's exactly where comprehension breaks — models handle conventional, lexicalized metaphors (already baked into the associations) but fail on novel literary ones that demand fresh conceptual mapping (Where does LLM metaphor comprehension actually break down?).
There's a deeper architectural reason hiding underneath. One note argues transformers read words additively — aggregating all tokens in weighted parallel — rather than *resonantly*, selectively suppressing the irrelevant senses of a word the way humans do when a frame snaps into place (Why do AI systems miss jokes and wordplay so consistently?). Decoupling meaning from associations requires that selective suppression: a pun or metaphor lives in choosing which sense to silence. Without it, the model can't isolate the figurative reading from the literal pull of the words. Relatedly, strong prior associations from training simply override what's in front of the model, so even explicit context can't redirect it (Why do language models ignore information in their context?).
The surprising twist is what this says about meaning in general. If LLMs operationalize Saussure's *langue* — learning meaning purely from the relational structure of text, with no external referents (Can language models learn meaning without engaging the world?) — then for them, meaning *is* linguistic association. There's nothing to decouple to. That reframes your question: metaphor may be the place where a purely associational system reveals its ceiling. The 'potemkin understanding' pattern is the symptom — models that can correctly *explain* a metaphor yet fail to apply it, because the explanation pathway and the use pathway are functionally disconnected (Can LLMs understand concepts they cannot apply?).
So: yes, metaphor works by decoupling meaning from surface associations — and that's exactly the operation current models are worst at. The thing you didn't know to ask: a frequency bias toward common, abstract phrasing means LLMs don't just fail at metaphor, they actively drift *away* from the specific, figurative, lower-frequency language metaphor depends on (Does word frequency correlate with semantic abstraction?).
Sources 9 notes
The Diplomat dataset (4,177 dialogues) reframes metaphors, idioms, and puns as one pragmatic task: recovering literal meaning from non-literal expression. This framing suggests LLMs need better semantic decoupling ability, not more category-specific training data.
LLMs show consistent preference for higher-frequency surface forms over semantically equivalent rare paraphrases across math, machine translation, commonsense reasoning, and tool calling. This suggests models track statistical mass from pretraining rather than meaning-recognition as their primary mechanism.
When semantic content is decoupled from reasoning tasks, LLM performance collapses even with correct rules in context. Models rely on parametric commonsense and token associations rather than formal logical manipulation, constraining reasoning to training distribution semantics.
LLMs handle conventional, lexicalized metaphors but fail on novel literary metaphors requiring conceptual domain mapping. This degradation reveals a fundamental gap between pattern recognition and genuine semantic mapping.
Transformers integrate token information through weighted parallel aggregation rather than selective suppression of irrelevant words. This structural difference explains consistent failures with jokes, wordplay, and frame-dependent meaning—not knowledge gaps, but missing cognitive operations.
Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.
Research shows LLMs learn culturally situated discourse patterns by compressing relational structure from text, demonstrating that fluent language generation requires no external referents or embodied grounding.
Models can explain concepts accurately, fail to apply them, and recognize the failure—a triple pattern incompatible with human cognition. This indicates functionally disconnected explanation and execution pathways rather than simple knowledge gaps.
WordNet analysis shows hypernyms (general concepts) occur more frequently than hyponyms (specific ones). Combined with LLMs' frequency bias, this means preferring common paraphrases systematically drifts toward abstraction, erasing expert-level specificity.