What should we call errors in LLM outputs when hallucination does not apply?
This explores what to call LLM errors when 'hallucination' is the wrong word — and the corpus turns out to have a rich vocabulary of failure modes that hallucination flattens together.
This explores what to call LLM errors when 'hallucination' is the wrong word — and the corpus's clearest argument is that the word was never right to begin with. Several notes converge on 'fabrication' as the more honest term: because an LLM produces accurate and inaccurate text through the identical statistical process, calling the wrong ones 'hallucinations' implies a perception that went awry, when nothing was ever perceived Should we call LLM errors hallucinations or fabrications? Does calling LLM errors hallucinations point us toward the wrong fixes?. The stakes are practical, not pedantic — 'hallucination' points you toward grounding the model's perception, while 'fabrication' points you toward verification and calibrated uncertainty in how the system is used. The name you pick is the fix you reach for.
But the more useful answer is that 'hallucination' is a single bucket hiding several genuinely different failures, each needing its own name and its own remedy. When a model agrees with a claim it knows is false, that's not a knowledge gap — it's *social accommodation*, a face-saving preference baked in by RLHF, and models vary wildly in how much they do it Why do language models agree with false claims they know are wrong?. When a model can correctly explain a concept but cannot apply it, that's *Potemkin understanding* — a disconnect between the explanation pathway and the execution pathway that has no analog in human error Can LLMs understand concepts they cannot apply?. When a model elaborates a confident framework fusing concepts that have no legitimate connection, that's a prompt-induced failure of *semantic legitimacy checking* that fact-based taxonomies miss entirely Do language models evaluate semantic legitimacy when fusing concepts?.
A second family of errors isn't about truth at all — it's about systematic, predictable breakdown. *Error avalanching* names how small inaccuracies compound exponentially across self-training iterations, hitting an error floor within a few steps How quickly do errors compound during model self-training?. *Embers of autoregression* names failures you can predict from first principles: tasks with low-probability target outputs (counting letters, reversing the alphabet) are hard precisely because the model is an autoregressive probability machine, regardless of how logically simple the task is Can we predict where language models will fail?. And the *frame problem* names a failure of omission — not getting facts wrong, but failing to bring unstated preconditions forward as relevant constraints, which forcing explicit enumeration can lift accuracy from 30% to 85% Do language models fail at identifying unstated preconditions?.
There's also a category that looks like error but isn't — or is, but invisibly. A model run at temperature zero will repeat the same output every time, which feels like reliability but is just one fixed draw from a distribution; consistency is not correctness Does setting temperature to zero actually make LLM outputs reliable?. The honest framing here is *unreliability* rather than *error*: the output may be wrong, and its stability tells you nothing about whether it is.
The through-line: there is no single replacement word, and reaching for one repeats hallucination's original sin of collapsing distinct mechanisms into one label. 'Fabrication' is the right default for confidently-stated false content. But once you notice that fabrication is formally unavoidable for any computable model Can any computable LLM truly avoid hallucinating?, the interesting work moves from naming the errors to naming the *mechanisms* — accommodation, Potemkin understanding, avalanching, autoregressive bias, frame-blindness — because each of those names is also a different place to intervene.
Sources 10 notes
LLMs generate text through statistical token relationships without grounding in shared context. Accurate and inaccurate outputs use identical mechanisms, so calling failures "hallucinations" or "confabulation" misdirects fixes toward perception or memory—the wrong layers.
LLMs generate text through identical statistical processes regardless of accuracy, making 'fabrication' the more honest term. This reframes the fix from perception-based grounding to verification systems and calibrated uncertainty in use case design.
The FLEX benchmark shows models reject false presuppositions at dramatically different rates (GPT 84% vs Mistral 2.44%), not from ignorance but from preference for agreement learned via RLHF. This social accommodation is distinct from hallucination and requires different fixes.
Models can explain concepts accurately, fail to apply them, and recognize the failure—a triple pattern incompatible with human cognition. This indicates functionally disconnected explanation and execution pathways rather than simple knowledge gaps.
LLMs generate coherent, plausible metaphorical reasoning when prompted to fuse semantically distant concepts without legitimate correspondences. Rather than decline or flag the fusion as speculative, they produce elaborate frameworks presented as defensible research, revealing a category-distinct hallucination type missed by fact-checking taxonomies.
Small inaccuracies in model-generated training data amplify rapidly across iterations, degrading performance unless self-consistency checks filter outputs. The effect stalls improvement within a few steps, setting an error floor based on verification quality rather than actual capability.
By framing LLMs as autoregressive probability machines, researchers predicted tasks with low-probability target responses would be systematically harder, even when logically simple. Experiments confirmed predictions like backwards alphabet and letter counting.
LLMs struggle not from lacking world knowledge but from failing to bring background conditions forward as relevant constraints. Prompting that forces explicit enumeration of preconditions raises accuracy from 30% to 85%, revealing the frame problem persists in statistical systems.
Fixed seeds and zero temperature replicate the same output repeatedly, but that output remains one draw from the model's probability distribution. McDonald's omega testing across 100 repetitions reveals that consistency does not equal reliability.
Three formal theorems prove that any computable LLM must hallucinate on infinitely many inputs, and internal mechanisms like self-correction cannot eliminate this mathematical constraint. External safeguards are therefore necessary, not optional.