Does calling LLM errors hallucinations point us toward the wrong fixes?
Explores whether the metaphor of 'hallucination' for LLM errors misdirects our efforts. The terminology we choose shapes which interventions we prioritize and how we conceptualize the underlying problem.
Post angle: The word "hallucination" for LLM errors is not just imprecise — it's actively misleading in a way that shapes what we try to fix.
Hallucination is a perceptual phenomenon: you perceive something that isn't there. The fix is better perception — better access to ground truth, better verification against sensory experience. If LLMs "hallucinate," the solution is to ground them better: give them access to real-time data, retrieval-augmented generation, external verification.
But this is the wrong frame. LLMs don't perceive. They generate. The process that produces a true statement is identical to the process that produces a false one. Both are statistical pattern completions from training data. There is no internal mechanism that would allow a correctly-grounded output to be distinguished from a fabricated one, because neither is "grounded" in the sense that perception is.
"Confabulation" — the other common term — imports psychology. Confabulation is a memory compensation mechanism: producing plausible narratives to fill gaps in functioning memory, typically associated with neurological conditions. LLMs don't have functioning memory with gaps. They have trained weights that produce outputs.
"Fabrication" is more honest: generating text without grounding in shared context or world experience, where the generative process is the same regardless of output accuracy. This reframes the problem correctly: the issue is not detection of bad outputs from good ones, but the absence of grounding that would make any output verifiable.
The practical difference: "hallucination" points toward better grounding at inference time. "Fabrication" points toward verification systems, calibrated uncertainty, and use case design that doesn't require reliability without verification infrastructure.
Inquiring lines that use this note as a source 25
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- What makes LLM outputs fabrication rather than hallucination or confabulation?
- What should we call errors in LLM outputs when hallucination does not apply?
- How does LLM hallucination risk manifest in knowledge graph construction?
- How much does ROUGE metric choice inflate hallucination detection claims?
- Does inevitable LLM hallucination make detection metric validity critical?
- What repair strategies work best at each level of Clark's ladder?
- Why is hallucination the wrong term for all LLM false outputs?
- How do models decide between refusing or hallucinating?
- Do self-correction and chain-of-thought prompting reduce hallucination rates?
- How do external safeguards like retrieval augmentation prevent hallucination?
- What distinguishes intrinsic hallucination from extrinsic hallucination patterns?
- Can training procedures fix LLM accommodation of false presuppositions?
- How do cognitive load dimensions interact with hallucination awareness in prompts?
- Why do models hallucinate when retrieval heads fail despite having information in context?
- Why do interventions for hallucination or automation bias fail to address capability misattribution?
- Why do LLMs understand therapy techniques but fail to execute them?
- Do LLMs show stigma or reinforce delusions in mental health contexts?
- Is hallucination mechanistically identical to generalization across datasets?
- Does framing LLM output as fabrication rather than hallucination matter philosophically?
- When is interleaved tool feedback necessary to prevent hallucination?
- Why does model confidence fail to detect hallucinations on rare entity pairs?
- Does the alignment frame mislead us about what LLM problems actually are?
- Does prompting for accuracy actually reduce LLM hallucinations and errors?
- Can filtering unknown examples during fine-tuning prevent hallucination increases?
- Does retrieval augmented generation actually eliminate hallucinations in any domain?
Related concepts in this collection 4
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Should we call LLM errors hallucinations or fabrications?
Does the language we use to describe LLM failures shape the technical solutions we build? Examining whether perceptual and psychological frameworks misdiagnose what's actually happening.
the underlying insight
-
What makes linguistic agency impossible for language models?
From an enactive perspective, does linguistic agency require embodied participation and real stakes that LLMs fundamentally lack? This matters because it challenges whether LLMs can truly engage in language or only generate text.
why fabrication is structural
-
Do language models actually use their encoded knowledge?
Probes can detect that LMs encode facts internally, but do those encoded facts causally influence what the model generates? This explores the gap between knowing and doing.
what IS happening
-
Can we detect when language models confabulate?
Current uncertainty metrics fail to catch inconsistent outputs that look confident. Could measuring semantic divergence across samples reveal confabulation signals that token-level metrics miss?
operationalizes detection of one class of fabrication by measuring meaning-level inconsistency across sampled outputs; the method is terminologically aligned with the fabrication framing since it treats all generation as the same process and flags semantic inconsistency rather than deviation from "truth"
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models
- The Model Says Walk: How Surface Heuristics Override Implicit Constraints in LLM Reasoning
- Large Models of What? Mistaking Engineering Achievements for Human Linguistic Agency
- The LLM Fallacy: Misattribution in AI-Assisted Cognitive Workflows
- A comprehensive taxonomy of hallucinations in Large Language Models
- Hallucination is Inevitable: An Innate Limitation of Large Language Models
- Beyond Hallucinations: The Illusion of Understanding in Large Language Models
- Chain-of-Verification Reduces Hallucination in Large Language Models
Original note title
llms are fabricators not hallucinators — why terminology shapes how we fix ai