SYNTHESIS NOTE
Reasoning, Retrieval, and Evaluation Language, Text, and Discourse Model Architecture and Internals

Do LLMs predict entailment based on what they memorized?

Explores whether language models make entailment decisions by recognizing memorized facts about the hypothesis rather than reasoning through the logical relationship between premise and hypothesis.

Synthesis note · 2026-02-21 · sourced from Natural Language Inference
What kind of thing is an LLM really? How should researchers navigate LLM reasoning research?

McKenna et al. (2023) named a specific, reproducible bias in LLM entailment behavior: the attestation bias. When an LLM is asked whether premise P entails hypothesis H, its prediction is bound to the hypothesis's out-of-context truthfulness — whether H is attested in training data — rather than the conditional truth of H given P.

The mechanism is clear: if a model's training data confirms H as true (independently of any premise), the model is likely to predict entailment regardless of what P says. Conversely, if H is not attested, the model is less likely to predict entailment even when it would be correct. Entities serve as "indices" to memorized propositions — the presence of a known entity activates stored associations that override the in-context reasoning task.

The authors demonstrate this with a "random premise" experiment: replace the original premise with a random unrelated premise while keeping H constant. An ideal inference model should detect that entailment is no longer supported and predict "no entailment." LLMs instead maintain elevated entailment predictions when H is attested — demonstrating that they are responding to stored propositions about H, not to the P→H relationship.

This connects to two complementary failure modes already in the vault. Do language models actually use their encoded knowledge? shows that encoded knowledge doesn't reliably affect generation. Attestation bias is the inverse problem: memorized statements do influence generation, but in the wrong direction — they substitute for rather than support proper inference. Both failures arise from the same root: LLM generation is not governed by a clean separation between retrieved knowledge and in-context reasoning.

The practical implication: NLI benchmark performance measures a combination of reasoning and memorization that cannot be cleanly disentangled without carefully designed bias-adversarial test sets.

Inquiring lines that use this note as a source 52

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
15 direct connections · 177 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

llm entailment predictions are bound to hypothesis attestation rather than premise-hypothesis inference