Do language models overestimate how often irony appears?
This explores whether LLMs systematically misread ironic intent in text, assigning higher irony scores than humans do. The gap suggests models learn irony patterns from training data without understanding their actual frequency in real communication.
GPT-4o can interpret ironic intent in emoji usage. But it systematically overestimates ironic intent compared to humans — the median irony score assigned by GPT-4o is significantly higher than human perception (p < .001). LLMs detect irony as a category but miscalibrate its prevalence (Irony in Emojis: A Comparative Study of Human and LLM Interpretation).
This overestimation reveals something important about how LLMs process pragmatic meaning. Irony detection is a pattern-matching success: the model has learned which textual features correlate with ironic intent in its training data. But ironic patterns are over-represented in training data relative to their actual frequency in human communication, because ironic usage is more salient, more commented upon, more explicitly labeled than sincere usage. The model learns the pattern but not the base rate.
This is a specific instance of a broader calibration problem. Since Why do preference models favor surface features over substance?, we know that training data artifacts systematically distort model judgments across multiple dimensions. Irony overestimation is the pragmatic version: the model's sense of "how often is this ironic?" is calibrated to training data saliency, not to real-world frequency.
The implication for literary analysis is significant. Literary irony is subtle, context-dependent, and often operates through understatement — exactly the opposite of the salient, explicitly marked irony that dominates training data. A model that over-reads ironic intent will find irony where an author intended none, and may miss genuine irony that operates through restraint rather than exaggeration. Since Can language models adapt implicature to conversational context?, the failure to calibrate irony to context is part of a larger pattern: LLMs apply fixed pragmatic templates where communicative context should modulate interpretation.
Inquiring lines that use this note as a source 23
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Does AI struggle with poetry for the same reason it misses jokes?
- Can AI detect sense-of-nonsense the way human readers do?
- Can language models adapt irony detection to specific communicative contexts?
- What happens when LLMs analyze literary irony that relies on understatement?
- What percentage of natural language relies on plausible deniability through ambiguous phrasing?
- Do language models inherit gender bias from training data in grading tasks?
- Can implicit linguistic information ever be reliably learned from training data?
- How do discourse-level patterns reveal cognitive distortions better than individual statements?
- Can language models understand the implicit emotional intent behind questions?
- Why do language models infer political orientation from seemingly innocuous user signals?
- Can discourse communities collectively detect disruptions individual readers miss?
- Why does AI struggle with wordplay when it has access to word embeddings?
- Do language models show the same truth bias as humans?
- Do language models systematically overestimate accuracy on collective behavior tasks?
- Can AI systems detect deception by monitoring real-time linguistic style matching patterns?
- Why do language models overestimate irony likelihood in emoji use?
- Why do different readers extract different meanings from identical text?
- How does this pattern match false punditry in AI commentary?
- Can a system without an addressee ever truly tell a joke?
- Can AI models predict whether alignment reads as warmth versus mockery in different cultures?
- Can implicit association tests reveal LLM biases beneath trained responses?
- Can detectors trained for one task reliably perform differently on unexpected text sources?
- Can readers detect meaning through resonance patterns alone without knowing authorial intent?
Related concepts in this collection 3
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Why do preference models favor surface features over substance?
Preference models show systematic bias toward length, structure, jargon, sycophancy, and vagueness—features humans actively dislike. Understanding this 40% divergence reveals whether it stems from training data artifacts or architectural constraints.
calibration bias from training data saliency
-
Can language models adapt implicature to conversational context?
Do large language models flexibly modulate scalar implicatures based on information structure, face-threatening situations, and explicit instructions—as humans do? This tests whether pragmatic computation is truly context-sensitive or merely literal.
fixed pragmatic templates where context should modulate
-
Why do speakers deliberately use ambiguous language?
Explores whether ambiguity is a linguistic defect or a strategic tool speakers use for efficiency, politeness, and deniability. Matters because it challenges how we train language systems.
irony operates through productive ambiguity between literal and intended meaning
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Irony in Emojis: A Comparative Study of Human and LLM Interpretation
- Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models?
- ChatGPT Reads Your Tone and Responds Accordingly -- Until It Does Not -- Emotional Framing Induces Bias in LLM Outputs
- Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models
- When Large Language Models are More Persuasive Than Incentivized Humans, and Why
- Large Language Models Can Infer Psychological Dispositions of Social Media Users
- Computational structuralism: Toward a formal theory of meaning in the age of digital intelligence
- Large Language Models Do Not Simulate Human Psychology
Original note title
LLM irony detection systematically overestimates ironic intent — calibration bias reveals pattern recognition without pragmatic understanding