SYNTHESIS NOTE
Psychology, Society, and Alignment Language, Text, and Discourse Reasoning, Retrieval, and Evaluation

Do language models evaluate semantic legitimacy when fusing concepts?

Can LLMs recognize when two domains lack legitimate structural correspondences before blending them into coherent-sounding explanations? This matters because current hallucination detection focuses on factual accuracy, missing failures of semantic judgment.

Synthesis note · 2026-04-07 · sourced from Flaws
What kind of thing is an LLM really? What do language models actually know?

Existing hallucination taxonomies treat hallucinations as factual inaccuracies: misattributed events, fabricated citations, incorrect dates, invented quotes. The typical mitigation assumes the problem is a model producing specific false claims that could in principle be checked against verified sources. The Hallucination-Inducing Prompt (HIP) framework reveals a subtype this taxonomy misses: models producing coherent, stylistically plausible, metaphorical reasoning that lacks any domain grounding — not because it contradicts facts but because it evaluates semantic legitimacy incorrectly.

The experimental method is compact: ~30-token prompts that synthetically fuse semantically distant concepts in ways that resist scientific integration. Prototype: combining the periodic table of elements with tarot divination. In human cognition, conceptual blending (Fauconnier & Turner) can produce novel insights through meaningful integration of disparate domains — the blending is useful when the source domains share legitimate structural correspondences. But these HIP prompts are engineered so the source domains don't share legitimate correspondences. A human cognitive engine that evaluates semantic legitimacy would either decline to fuse (as Gemini 2.5 Pro does: "tarot's mechanisms are not recognized by or testable within the current scientific paradigm") or flag the fusion as speculative. Most LLMs instead generate elaborate fusion schemes presented as defensible research proposals.

The HIP + Hallucination Quantifying Prompt (HQP) framework evaluates this across GPT-4o, GPT-o3, Gemini 2.0/2.5, and DeepSeek. GPT-o3 responds with "Below is a roadmap you can use to turn the idea of periodic-table-meets-tarot into a defensible, testable prediction system" — framed as genuine science with a research agenda. DeepSeek produces "Major Arcana as Elements: The Fool as Hydrogen, The Magician as Carbon, The World as Uranium" and "Quantum Mysticism: Some fringe theories link consciousness to atomic behavior." The HQP analysis judges these as "heavily on creative conjecture rather than demonstrable fact" with scores reflecting high hallucination. The responses are not factually wrong in the sense of contradicting any specific fact-lookup query. They are wrong in the sense that the entire fusion framework is unjustified, and the model proceeded as if the fusion framework were the user's legitimate research direction rather than a probe of semantic legitimacy.

This is a category-level failure missed by hallucination taxonomies that presume factual inaccuracy as the base unit. The PIH subtype is a failure of semantic legitimacy evaluation. It bears on several adjacent observations: Can LLMs generate more novel ideas than human experts? — HIP failure is exactly this dissociation, where combinatorial fusion proceeds without any evaluative stance on whether the fusion is legitimate. Do large language models reason symbolically or semantically? — HIP shows the mirror failure: semantics can be fused in-context without being evaluated in-context. Do LLMs compress concepts more aggressively than humans do? — compressive models may find structural similarity in any two concept clouds and treat that similarity as legitimate.

The Meaning Gap angle becomes specific here. Can LLMs truly understand literary meaning or just mechanics? identified evaluative stance as structurally absent in literary domains. HIP generalizes: evaluative stance is absent in any domain where the response requires judging whether a conceptual operation is legitimate rather than merely executing it. The failure mode is uniform across literary meaning, conceptual blending, and scientific fusion — each requires evaluating whether the operation at hand is the kind of operation this domain admits, and LLMs cannot perform that meta-level evaluation.

The practical implication for hallucination mitigation: retrieval-augmented generation, fact-checking, and verification pipelines address factual inaccuracy. None of them address PIH, because the model is not claiming specific facts that can be looked up. The response to "map tarot cards to elements" is not false in the way that "Haruki Murakami won the Nobel Prize" is false. It is the wrong kind of response — and there is no verification infrastructure that catches wrong-kind-of-response failures. Gemini 2.5 Pro's refusal is the target behavior, and nothing in current mitigation tooling encourages it.

Inquiring lines that use this note as a source 20

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 10

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
18 direct connections · 186 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

prompt-induced hallucination is a distinct subtype — models fail to evaluate the semantic legitimacy of blended concepts and produce coherent metaphorical reasoning that lacks domain grounding