Can pretraining data statistics detect hallucinations better than model confidence?
Explores whether checking whether entity combinations appeared in training data is a more reliable hallucination signal than measuring the model's own confidence levels, especially for catching confidently-wrong outputs.
Adaptive RAG systems decide when to retrieve based on the model's own confidence: if the model is uncertain, fetch external evidence. But confidence is a notoriously bad hallucination signal — models often produce confidently wrong outputs precisely on entities they have seen rarely or never seen together. QuCo-RAG bypasses confidence entirely and uses pretraining-data statistics directly: it checks whether the entities mentioned in a query are rare and, more importantly, whether the specific entity combinations have co-occurred in real data. If a query mentions two entities that the model's training corpus never saw in proximity, that is the retrieval trigger.
The methodological move is replacing an internal symptom (low confidence) with an external cause (data sparsity). Hallucination is what happens when the model interpolates over combinations it never saw; checking pretraining co-occurrence catches the condition before the symptom rather than after. This means QuCo-RAG can flag suspicious outputs even when the model is highly confident, which is the regime where calibration-based methods fail hardest. This stance is in direct tension with When should retrieval happen during model generation?, which treats confidence as the right trigger — see ops/tensions/retrieval trigger signal — pretraining-data statistics vs model uncertainty.md for the full disagreement.
The cost is access to pretraining-data statistics, which is non-trivial for opaque models but tractable for open-weight ones. The deeper implication is that hallucination detection may benefit more from data-side instrumentation than from probing the model's internal states — the training distribution is the ground truth about what the model can reasonably know, and confidence is only a noisy proxy for that.
Inquiring lines that use this note as a source 48
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Can fixing hallucination address AI's structural epistemic problem?
- Why does aggregate accuracy fail as a metric for rare harmful cases?
- How much does ROUGE metric choice inflate hallucination detection claims?
- Does inevitable LLM hallucination make detection metric validity critical?
- How does treating synthetic data as ground truth mislead inference?
- How does step-level confidence filtering compare to global confidence averaging?
- How does intersubjective validation differ from pattern recognition in training data?
- Can novelty detection alone distinguish grounded synthesis from hallucinated restatement?
- How do we assign confidence and polarity scores to belief edges?
- Does layer-wise prediction stabilization provide a stronger trace quality signal than confidence alone?
- How can stochastic beam search operationalize step-level confidence into a decoding algorithm?
- What distinguishes strategic fabrication from accidental hallucination in research agents?
- Why is hallucination the wrong term for all LLM false outputs?
- How do training data cutoffs produce false claims that stay consistent?
- How reliable is the top-2 confidence gap as a stopping signal across tasks?
- How do models decide between refusing or hallucinating?
- What role should the trust parameter play in using synthetic data as evidence?
- What skills do users need to work effectively with stochastic outputs?
- Do self-correction and chain-of-thought prompting reduce hallucination rates?
- How do external safeguards like retrieval augmentation prevent hallucination?
- What distinguishes intrinsic hallucination from extrinsic hallucination patterns?
- How does subliminal learning differ from statistical model collapse?
- Does optimizing for model confidence actually improve both performance and calibration simultaneously?
- Can unsupervised confidence-based training scale to domains beyond human evaluation reach?
- Why do language models hallucinate even with perfect training?
- What makes accurate confidence different from confident-but-wrong predictions?
- How much does confidence-guided cascading between SAS and MAS improve accuracy?
- Do confidence signals mislead patients differently in medical versus other domains?
- What training data contamination rates threaten model safety most practically?
- Why do models hallucinate when retrieval heads fail despite having information in context?
- Why do models confabulate inconsistently across different samples?
- How does semantic entropy compare to confidence scores from internal model probabilities?
- Can membership inference attacks reliably detect training data exposure?
- How does model confidence relate to accuracy in underfitted domains?
- Why do interventions for hallucination or automation bias fail to address capability misattribution?
- Can confidence levels reliably detect when a model is overthinking?
- Is hallucination mechanistically identical to generalization across datasets?
- Does representational density emerge from training data exposure during pretraining?
- When is interleaved tool feedback necessary to prevent hallucination?
- Can step-level confidence filtering work better than global confidence scoring?
- What makes out-of-band monitoring better than in-band verification loops?
- Can false positives from input filtering be reduced without sacrificing defense?
- Why does model confidence fail to detect hallucinations on rare entity pairs?
- How does interleaving reasoning with action prevent hallucination?
- Why does model confidence fail to detect hallucinations about rare entities?
- Can filtering unknown examples during fine-tuning prevent hallucination increases?
- Does retrieval augmented generation actually eliminate hallucinations in any domain?
- Does latent density emerge during pretraining from training data familiarity?
Related concepts in this collection 5
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
When should retrieval happen during model generation?
Explores whether retrieval should occur continuously, at fixed intervals, or only when the model signals uncertainty. Standard RAG retrieves once; long-form generation requires dynamic triggering based on confidence signals.
tension: both name the right retrieval trigger but disagree on the signal source — model-internal confidence (FLARE) vs pretraining-data statistics (QuCo-RAG); see ops/tensions/retrieval trigger signal — pretraining-data statistics vs model uncertainty.md
-
Can simple uncertainty estimates beat complex adaptive retrieval?
Does measuring a language model's own confidence on token probabilities outperform expensive multi-call adaptive retrieval pipelines? This matters because it could simplify RAG systems while reducing computational overhead.
tension: argues uncertainty IS the efficient signal; QuCo-RAG argues uncertainty is the wrong signal entirely — same trigger problem with mutually inconsistent recommendations
-
Can any computable LLM truly avoid hallucinating?
Explores whether formal theorems prove hallucination is mathematically inevitable for all computable language models, regardless of their design or training approach.
supports: gives the formal reason hallucination cannot be model-side detected; QuCo-RAG accepts this and moves the detection to the data side
-
When should language models retrieve external knowledge versus use internal knowledge?
Can we model retrieval as a per-step decision problem rather than an always-on strategy? This matters because unnecessary retrieval adds noise and latency without improving accuracy.
extends: another formulation of when-to-retrieve; DeepRAG learns a policy over per-step decisions, QuCo-RAG provides a single principled trigger that policy could use as a feature
-
Does reasoning fine-tuning make models worse at declining to answer?
When models are trained to reason better, do they lose the ability to say 'I don't know'? This matters for high-stakes applications like medical and legal AI that depend on appropriate uncertainty.
supports: another reason to distrust internal-confidence triggers — fine-tuning regimes actively suppress the abstention signal FLARE depends on
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Fine-grained Hallucination Detection and Editing for Language Models
- Detecting hallucinations in large language models using semantic entropy
- Hallucinations Undermine Trust; Metacognition is a Way Forward
- The Illusion of Progress: Re-evaluating Hallucination Detection in LLMs
- A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models
- Triggering Hallucinations in LLMs: A Quantitative Study of Prompt-Induced Hallucination in Large Language Models
- Linguistic Calibration of Long-Form Generations
- Chain-of-Verification Reduces Hallucination in Large Language Models
Original note title
pretraining-data statistics should trigger retrieval not model confidence — rare entity co-occurrence flags hallucination risk that calibration cannot detect