Why does AI struggle with wordplay when it has access to word embeddings?
This explores why AI fumbles puns, jokes, and double meanings even though it stores rich semantic information about every word — the gap turns out to be in how it combines words, not what it knows about them.
This explores why AI fumbles puns, jokes, and double meanings even though it stores rich semantic information about every word. The short version: the failure isn't a knowledge gap, it's a missing cognitive operation. AI's word embeddings genuinely do encode meaning — clustering analysis shows they're sensitive to valence, concreteness, iconicity, even taboo, so the raw lexical material is all there before any sentence is assembled Do transformer static embeddings actually encode semantic meaning?. The problem starts when those words have to be combined.
Wordplay depends on holding two frames at once and letting the right one win. Human meaning-making is *selectively resonant*: the mind lights up the words that cohere into a frame and suppresses the linguistically adjacent ones that don't, tracking frame-coherence rather than mere co-occurrence Does the mind selectively activate frames from only some words?. A transformer does the opposite — it integrates every token through weighted parallel aggregation, reading words additively rather than resonantly. There's no mechanism to suppress the irrelevant reading and spotlight the punning one, which is exactly the operation a joke requires Why do AI systems miss jokes and wordplay so consistently?. So embeddings give you meaning *per word*; what's missing is selective frame-activation *across* words.
What makes this interesting is that it's the same structural shape behind several other figurative-language failures. AI can detect irony as a pattern but badly miscalibrates how often it occurs, because ironic examples loom larger in training text than in real use — it recognizes the trick without sensing its rarity Do language models overestimate how often irony appears?. And there's a deeper echo in 'Potemkin understanding': models can explain a concept correctly, then fail to apply it, then even recognize the failure — a sign that explanation and execution run on functionally disconnected pathways Can LLMs understand concepts they cannot apply?. Wordplay is the live demonstration of that disconnect: knowing what 'bank' means in two senses is not the same as deploying both at the comic moment.
There's a tantalizing wrinkle, though. Under hard, out-of-distribution tasks, LLM hidden states actually do sparsify — activations narrow in a localized, systematic way that acts like a selective filter Do language models sparsify their activations under difficult tasks?. So the architecture isn't categorically incapable of selectivity; it just doesn't apply it where frame-coherence demands. Combine that with the finding that strong training-time associations routinely override what's actually in front of the model Why do language models ignore information in their context?, and the picture sharpens: a pun asks the model to favor a low-prior, context-specific reading over a high-prior literal one — precisely the move its weighting tends to lose.
The thing worth walking away with: 'has word embeddings' and 'gets the joke' are answers to two different questions. Embeddings solve storage; humor solves selection. The corpus keeps pointing at the same boundary — between systems that aggregate meaning and minds that resonate with it — which is why the most human-feeling failures of AI cluster around irony, metaphor, and play rather than around facts.
Sources 7 notes
Clustering analysis of RoBERTa embeddings reveals sensitivity to five psycholinguistic measures including valence, concreteness, iconicity, and taboo. This demonstrates that static embeddings function as genuine lexical entries containing semantic content before self-attention operates.
Human meaning-making operates through selective frame activation: the mind holds frame-related words in tight resonance while ignoring linguistically adjacent but frame-unrelated words. This selectivity tracks frame-coherence, not co-occurrence frequency, and represents a cognitive operation that standard similarity computation cannot capture.
Transformers integrate token information through weighted parallel aggregation rather than selective suppression of irrelevant words. This structural difference explains consistent failures with jokes, wordplay, and frame-dependent meaning—not knowledge gaps, but missing cognitive operations.
GPT-4o assigns significantly higher irony scores than humans (p < .001), revealing that LLMs detect irony as a pattern but miscalibrate its prevalence because ironic examples are more salient in training data than in actual use.
Models can explain concepts accurately, fail to apply them, and recognize the failure—a triple pattern incompatible with human cognition. This indicates functionally disconnected explanation and execution pathways rather than simple knowledge gaps.
As task difficulty increases, LLM hidden states become substantially sparser in a localized, systematic way that correlates with task unfamiliarity and reasoning load. This sparsification acts as a selective filter stabilizing performance under OOD shift rather than a failure mode.
Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.