SYNTHESIS NOTE

Topics›Natural Language Inference›this note

Do LLMs predict entailment based on what they memorized?

Explores whether language models make entailment decisions by recognizing memorized facts about the hypothesis rather than reasoning through the logical relationship between premise and hypothesis.

Synthesis note · 2026-02-21 · sourced from Natural Language Inference

McKenna et al. (2023) named a specific, reproducible bias in LLM entailment behavior: the attestation bias. When an LLM is asked whether premise P entails hypothesis H, its prediction is bound to the hypothesis's out-of-context truthfulness — whether H is attested in training data — rather than the conditional truth of H given P.

The mechanism is clear: if a model's training data confirms H as true (independently of any premise), the model is likely to predict entailment regardless of what P says. Conversely, if H is not attested, the model is less likely to predict entailment even when it would be correct. Entities serve as "indices" to memorized propositions — the presence of a known entity activates stored associations that override the in-context reasoning task.

The authors demonstrate this with a "random premise" experiment: replace the original premise with a random unrelated premise while keeping H constant. An ideal inference model should detect that entailment is no longer supported and predict "no entailment." LLMs instead maintain elevated entailment predictions when H is attested — demonstrating that they are responding to stored propositions about H, not to the P→H relationship.

This connects to two complementary failure modes already in the vault. Do language models actually use their encoded knowledge? shows that encoded knowledge doesn't reliably affect generation. Attestation bias is the inverse problem: memorized statements do influence generation, but in the wrong direction — they substitute for rather than support proper inference. Both failures arise from the same root: LLM generation is not governed by a clean separation between retrieved knowledge and in-context reasoning.

The practical implication: NLI benchmark performance measures a combination of reasoning and memorization that cannot be cleanly disentangled without carefully designed bias-adversarial test sets.

Inquiring lines that read this note 54

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

Can AI-generated outputs constitute genuine knowledge or valid claims?

How does instrumental reasoning reproduce pre-Enlightenment knowledge structures?

How do language models establish social grounding in human dialogue?

Can prompting inject entirely new knowledge into language models?

How does surface salience compete with background knowledge in model inference?

How can models identify insufficient information and respond appropriately without guessing?

How do models signal knowledge gaps through token probability?

Do language models learn genuine linguistic structure or just surface patterns?

How should retrieval systems optimize for multi-step reasoning during inference?

How do entailment checks prevent synthetic data from degrading retrieval corpora?

How do training priors constrain what context information can override?

Do language models perform faithful symbolic reasoning independent of semantic grounding?

How faithfully do LLMs reflect their actual reasoning in outputs and explanations?

How do language models inherit human biases from training data?

Why does hypothesis attestation bias exist separately from frequency bias in NLI?

How do training data properties shape reasoning capability development?

How can entailment benchmarks separate genuine reasoning from memorization effects?

Can prompting strategies overcome LLM biases without model fine-tuning?

Why do entities trigger memorized propositions instead of enabling reasoning?

Why do language models struggle with implicit discourse relations?

Why do continual learning scenarios trigger catastrophic forgetting and interference?

How does memorization interact with learning and generalization?

Why is extracting training data insufficient proof that models memorize?

Why does supervised fine-tuning improve accuracy while degrading reasoning quality?

How does fine-tuning on natural language inference affect fallacy susceptibility?

Do language models understand semantics or rely on pattern matching?

Why do language models reinforce false assumptions instead of correcting them?

How do LLMs distinguish causal reasoning from temporal and semantic associations?

Why do reasoning models fail at systematic problem-solving and search?

Do base models contain latent reasoning that training can unlock?

Do accurate-looking LLM outputs hide structural failures in learning and reasoning?

How does the LLM Fallacy prevent users from noticing cognitive debt accumulating?

How do evaluation biases undermine LLM quality assessment systems?

Why does probability of text completion not equal knowledge value?

Why does finetuning cause catastrophic forgetting of model capabilities?

How do neural networks separate factual knowledge from reasoning abilities?

What makes procedural knowledge in documents generalize better than facts?

Does model scaling alone produce compositional generalization without symbolic mechanisms?

How does evidence retrieval affect compositional reasoning in language models?

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

15 direct connections · 176 in 2-hop network ·dense cluster Open in graph ↗

Do LLMs predict entailment based on what they me… Do language models actually use their encoded know… Why do language models ignore information in their… Does fine-tuning on NLI teach inference or amplify…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Do language models actually use their encoded knowledge? Probes can detect that LMs encode facts internally, but do those encoded facts causally influence what the model generates? This explores the gap between knowing and doing.
the complementary failure: encoded knowledge that doesn't influence generation; attestation is memorized knowledge that influences generation in the wrong direction
Why do language models ignore information in their context? Explores why language models sometimes override contextual information with prior training associations, and whether providing more context can solve this problem.
same mechanism: parametric associations override in-context information
Does fine-tuning on NLI teach inference or amplify shortcuts? When LLMs are fine-tuned on natural language inference datasets, do they learn genuine reasoning abilities or become better at exploiting statistical patterns in the training data? Understanding this distinction matters for assessing model capabilities.
fine-tuning makes attestation-related frequency bias worse, not better

Do LLMs predict entailment based on what they memorized?

Inquiring lines that read this note 54

Related concepts in this collection 3

Related papers in this collection 8

Search by related questions 4