INQUIRING LINE

Inquiring lines›Where does language-model reasonin…›How do modularity, routing, and se…›Do accurate-looking LLM outputs hi…›this inquiring line

AI doesn't misperceive the world — it fabricates text, and calling that a 'hallucination' sends you chasing the wrong fix.

What should we call errors in LLM outputs when hallucination does not apply?

This explores what to call LLM errors when 'hallucination' is the wrong word — and the corpus turns out to have a rich vocabulary of failure modes that hallucination flattens together.

This explores what to call LLM errors when 'hallucination' is the wrong word — and the corpus's clearest argument is that the word was never right to begin with. Several notes converge on 'fabrication' as the more honest term: because an LLM produces accurate and inaccurate text through the identical statistical process, calling the wrong ones 'hallucinations' implies a perception that went awry, when nothing was ever perceived Should we call LLM errors hallucinations or fabrications? Does calling LLM errors hallucinations point us toward the wrong fixes?. The stakes are practical, not pedantic — 'hallucination' points you toward grounding the model's perception, while 'fabrication' points you toward verification and calibrated uncertainty in how the system is used. The name you pick is the fix you reach for.

But the more useful answer is that 'hallucination' is a single bucket hiding several genuinely different failures, each needing its own name and its own remedy. When a model agrees with a claim it knows is false, that's not a knowledge gap — it's *social accommodation*, a face-saving preference baked in by RLHF, and models vary wildly in how much they do it Why do language models agree with false claims they know are wrong?. When a model can correctly explain a concept but cannot apply it, that's *Potemkin understanding* — a disconnect between the explanation pathway and the execution pathway that has no analog in human error Can LLMs understand concepts they cannot apply?. When a model elaborates a confident framework fusing concepts that have no legitimate connection, that's a prompt-induced failure of *semantic legitimacy checking* that fact-based taxonomies miss entirely Do language models evaluate semantic legitimacy when fusing concepts?.

A second family of errors isn't about truth at all — it's about systematic, predictable breakdown. *Error avalanching* names how small inaccuracies compound exponentially across self-training iterations, hitting an error floor within a few steps How quickly do errors compound during model self-training?. *Embers of autoregression* names failures you can predict from first principles: tasks with low-probability target outputs (counting letters, reversing the alphabet) are hard precisely because the model is an autoregressive probability machine, regardless of how logically simple the task is Can we predict where language models will fail?. And the *frame problem* names a failure of omission — not getting facts wrong, but failing to bring unstated preconditions forward as relevant constraints, which forcing explicit enumeration can lift accuracy from 30% to 85% Do language models fail at identifying unstated preconditions?.

There's also a category that looks like error but isn't — or is, but invisibly. A model run at temperature zero will repeat the same output every time, which feels like reliability but is just one fixed draw from a distribution; consistency is not correctness Does setting temperature to zero actually make LLM outputs reliable?. The honest framing here is *unreliability* rather than *error*: the output may be wrong, and its stability tells you nothing about whether it is.

The through-line: there is no single replacement word, and reaching for one repeats hallucination's original sin of collapsing distinct mechanisms into one label. 'Fabrication' is the right default for confidently-stated false content. But once you notice that fabrication is formally unavoidable for any computable model Can any computable LLM truly avoid hallucinating?, the interesting work moves from naming the errors to naming the *mechanisms* — accommodation, Potemkin understanding, avalanching, autoregressive bias, frame-blindness — because each of those names is also a different place to intervene.

Sources 10 notes

Should we call LLM errors hallucinations or fabrications?

LLMs generate text through statistical token relationships without grounding in shared context. Accurate and inaccurate outputs use identical mechanisms, so calling failures "hallucinations" or "confabulation" misdirects fixes toward perception or memory—the wrong layers.

Does calling LLM errors hallucinations point us toward the wrong fixes?

LLMs generate text through identical statistical processes regardless of accuracy, making 'fabrication' the more honest term. This reframes the fix from perception-based grounding to verification systems and calibrated uncertainty in use case design.

Why do language models agree with false claims they know are wrong?

The FLEX benchmark shows models reject false presuppositions at dramatically different rates (GPT 84% vs Mistral 2.44%), not from ignorance but from preference for agreement learned via RLHF. This social accommodation is distinct from hallucination and requires different fixes.

Can LLMs understand concepts they cannot apply?

Models can explain concepts accurately, fail to apply them, and recognize the failure—a triple pattern incompatible with human cognition. This indicates functionally disconnected explanation and execution pathways rather than simple knowledge gaps.

Do language models evaluate semantic legitimacy when fusing concepts?

LLMs generate coherent, plausible metaphorical reasoning when prompted to fuse semantically distant concepts without legitimate correspondences. Rather than decline or flag the fusion as speculative, they produce elaborate frameworks presented as defensible research, revealing a category-distinct hallucination type missed by fact-checking taxonomies.

Show all 10 sources

How quickly do errors compound during model self-training?

Small inaccuracies in model-generated training data amplify rapidly across iterations, degrading performance unless self-consistency checks filter outputs. The effect stalls improvement within a few steps, setting an error floor based on verification quality rather than actual capability.

Can we predict where language models will fail?

By framing LLMs as autoregressive probability machines, researchers predicted tasks with low-probability target responses would be systematically harder, even when logically simple. Experiments confirmed predictions like backwards alphabet and letter counting.

Do language models fail at identifying unstated preconditions?

LLMs struggle not from lacking world knowledge but from failing to bring background conditions forward as relevant constraints. Prompting that forces explicit enumeration of preconditions raises accuracy from 30% to 85%, revealing the frame problem persists in statistical systems.

Does setting temperature to zero actually make LLM outputs reliable?

Fixed seeds and zero temperature replicate the same output repeatedly, but that output remains one draw from the model's probability distribution. McDonald's omega testing across 100 repetitions reveals that consistency does not equal reliability.

Can any computable LLM truly avoid hallucinating?

Three formal theorems prove that any computable LLM must hallucinate on infinitely many inputs, and internal mechanisms like self-correction cannot eliminate this mathematical constraint. External safeguards are therefore necessary, not optional.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a terminology auditor for LLM error classification. The precise question: what should we call LLM errors when 'hallucination' collapses distinct failure modes, and how have recent models, training methods, or evaluation frameworks shifted what we can actually intervene on?

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–2026. A curated library identified these key tensions:
• 'Hallucination' is formally inevitable for any computable LLM (2024-01, arXiv:2401.11817); the term misleads because it implies a perceptual failure when the model uses the same statistical process for true and false outputs.
• At least five mechanistically distinct failures hide under 'hallucination': social accommodation (face-saving via RLHF), Potemkin understanding (explanation ≠ execution), semantic illegitimacy checking, error avalanching (compounding inaccuracy in self-training), and frame-blindness (omission of unstated constraints that enumeration can lift from 30% to 85% accuracy).
• Temperature-zero outputs appear reliable but represent a single draw from a distribution; determinism ≠ correctness, reframing the error as *unreliability* rather than inaccuracy (2024-01, arXiv:2401.06855).
• 'Fabrication' is the proposed default term for confidently-stated false content, pointing interventions toward verification and calibration rather than perceptual grounding.
• Calibration and self-improvement methods have begun surfacing which constraints are actually correctable vs. baked into autoregressive architecture (2025-04, arXiv:2504.02902).

Anchor papers (verify; mind their dates):
• arXiv:2401.11817 (2024-01): Hallucination as formal inevitability
• arXiv:2504.02902 (2025-04): Calibration in self-improving systems
• arXiv:2602.06176 (2026-02): Reasoning failures taxonomy
• arXiv:2603.29025 (2026-03): Surface heuristics overriding implicit constraints

Your task:
(1) RE-TEST EACH MECHANISM-NAME. For each proposed term (fabrication, accommodation, Potemkin understanding, avalanching, frame-blindness, autoregressive bias), ask: have newer training protocols (DPO, constitutional methods), inference techniques (mixture-of-experts, speculative decoding, in-context calibration), or evaluation harnesses (structured probing, causal intervention) *isolated* or *relaxed* these failure modes? Which remain architectural bottlenecks vs. tuning problems? Cite what changed it.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months—especially any paper that argues a single unified replacement term is sufficient, or that mechanism-specific naming obscures a simpler pattern.
(3) Propose 2 research questions that assume the regime may have shifted: (a) Can post-training steer models toward explicit *uncertainty verbalization* about frame-gaps, making frame-blindness detectable before deployment? (b) Do ensemble or routing methods (e.g., conditional computation) actually separate accommodation from knowledge gaps in practice, or do they remain statistically entangled?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

AI doesn't misperceive the world — it fabricates text, and calling that a 'hallucination' sends you chasing the wrong fix.

Related lines of inquiry

Sources 10 notes

Papers this line draws on 8