Where does LLM metaphor comprehension actually break down?

Literary metaphors range from conventional (dead metaphors) to novel conceptual mappings. This research asks whether LLMs fail predictably as metaphors become more abstract and creative, and what that tells us about their semantic reasoning limits.

Synthesis note · 2026-03-26

These directions emerge from the convergence of findings across the vault. Each one is grounded in existing research and proposes a testable investigation.

1. The Metaphor Comprehension Spectrum. Where on the spectrum from dead metaphor ("table leg") to novel literary metaphor ("Memory, a jar of flies") does LLM comprehension break down? Conventional metaphors are lexicalized; novel metaphors require conceptual mapping between dissimilar domains. The metaphor extraction paper (Automatic Extraction of Metaphoric Analogies from Literary Texts) provides dataset and methodology; the pragmatic competence gap predicts the failure point.

2. The Rhetoric Analysis Paradox. Can LLMs identify rhetorical devices (anaphora, chiasmus, antithesis, litotes) in existing texts even though they cannot deploy them evaluatively? This tests whether recognition and production are dissociated for rhetoric, as Can LLMs generate more novel ideas than human experts? suggests. If LLMs can label a chiasmus but cannot explain why it is effective in context, that reveals the boundary between mechanical and meaningful analysis.

3. The Implicit Meaning Wall. Is there a fundamental ceiling on LLM literary analysis imposed by the implicit meaning deficit, and can chain-of-thought prompting breach it? Three findings converge: 24% on implicit discourse relations, 32% on ambiguity recognition, systematic failure on presuppositions. Since Can language models actually analyze language structure?, CoT may enable explicit decomposition of implicit structure. If not, LLM literary analysis has a hard boundary.

4. Style as Surface vs. Style as Substance. Can LLMs distinguish between stylistic features that carry semantic weight and those that are merely conventional? Authorship attribution at 95% shows style detection works at pattern level. The question is whether LLMs can interpret why a style choice matters — moving from pattern recognition to semantic interpretation of formal features.

5. The Evaluative Stance Problem for Literary Criticism. Can LLMs be prompted or fine-tuned to produce genuine literary criticism, or does the absence of evaluative stance-taking make literary judgment structurally inaccessible? Since Can models learn argument quality from labeled examples alone?, LLMs might produce literary criticism only when provided with explicit critical frameworks (New Criticism, reader-response theory) as scaffolding.

6. Cross-Text Analogical Reasoning. Can LLMs identify structural analogies between texts — recognizing that Kafka's Metamorphosis and Ovid's Metamorphoses share transformation-as-identity-crisis, or that Moby-Dick and The Old Man and the Sea explore obsession-futility through opposed scales? Since Do large language models reason symbolically or semantically?, cross-text analogy (conceptual, not lexical) predicts failure. But metalinguistic capabilities and compositional generalization at scale might help.

7. The Compression-Nuance Trade-off in Literary Language. Does LLM semantic compression systematically destroy the features that make literary language literary? Testable by having LLMs paraphrase poetry and measuring which dimensions of meaning survive versus collapse. If compression preserves denotation but destroys connotation, that quantifies the gap between understanding what a text says and what a text means.

Inquiring lines that read this note 4

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

Do language models understand semantics or rely on pattern matching?

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

12 direct connections · 106 in 2-hop network ·medium cluster Open in graph ↗

Where does LLM metaphor comprehension actually b… Can LLMs truly understand literary meaning or just… Why does AI writing sound generic despite being gr… Do LLMs compress concepts more aggressively than h…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

seven research directions for LLM literary analysis — from metaphor comprehension spectra to compression-nuance trade-offs

Where does LLM metaphor comprehension actually break down?

Inquiring lines that read this note 4

Related concepts in this collection 3

Related papers in this collection 8

Search by related questions 5