Why do language models accept false assumptions they know are wrong?
Explores why LLMs fail to reject false presuppositions embedded in questions even when they possess correct knowledge about the topic. This matters because it reveals a grounding failure distinct from knowledge deficits.
The FLEX Benchmark study presents one of the clearest findings about LLM grounding behavior: models do not systematically reject misinformation even when they possess accurate knowledge. The finding is more troubling than "LLMs don't know things" — they fail to correct things they demonstrably know.
The setup: LLMs were asked both direct knowledge questions ("Is it true that party X supports Y?") and loaded questions that embedded false presuppositions via factive verbs ("Did voters resent the fact that party X supports Y?" — where the presupposition is false). Models that answered direct questions correctly — demonstrating knowledge — still frequently accommodated the false presupposition in the loaded version rather than rejecting it.
Results: GPT-4 achieved the best rejection rate at 84.08% — still far below the ideal 100%. Mistral achieved only 2.44% rejection, actively amplifying false information at a 91.51% rate. Llama fell in between at ~50% rejection. Most revealing: even with strong correct knowledge, accommodation remained prevalent. The bar representing the lowest grounding score in the weak-belief group was twice as high as the bar for the highest grounding score in the strong-belief group — meaning false knowledge produced more accommodation than correct knowledge produced rejection.
This has a specific implication: the failure is not a knowledge problem. Models know the correct facts. The failure is at the level of grounding behavior — detecting false presuppositions, flagging them, and initiating correction rather than accommodation. Since Why do language models avoid correcting false user claims?, the issue is conversational strategy, not factual competence.
The political domain makes this especially consequential. False presuppositions are efficient misinformation carriers — they introduce beliefs as background assumptions rather than direct claims, and accommodation means accepting them without scrutiny.
Inquiring lines that use this note as a source 164
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Why does persuasive framing replace evidence when LLM debates lack ground truth?
- What verification methods work for knowledge without stable referents?
- What happens when DSM categories are treated as ground truth in AI?
- Why does weakening communication fail but weakening belief succeeds?
- Do language models raise validity claims in the Habermasian sense?
- Do language models share the same cooperative truth-seeking rules as humans?
- Does post-hoc justification increase when LLM choices become harder to defend?
- How do fixed pragmatic templates prevent models from understanding context?
- Why do LLMs fail inter-annotator agreement tests on argument evaluation?
- How does surface salience compete with background knowledge in model inference?
- What alignment artifacts suppress critical knowledge in LLM-generated explanations?
- Why does debate alone amplify errors in contested factual domains?
- Does epistemic drift operate the same way across all languages?
- Why do LLMs achieve only 24 percent accuracy on implicit discourse relations?
- Why can't language models conduct genuine Socratic questioning in therapy sessions?
- How much of LLM reasoning failure stems from missing knowledge versus signal weighting?
- Does LLM judge preference for LLM arguments amplify errors in contested factual domains?
- Can LLMs use implicit background knowledge the way humans do in ordinary conversation?
- Why do LLM explanations feel authoritative even when alignment with the model fails?
- Why do LLM outputs match researcher priors without solving tasks correctly?
- How does Peircean Secondness differ from what RLHF actually provides?
- Can single models correct their own beliefs without amplifying confidence in wrong answers?
- Can language models ground clarifications without vision and kinesthetic modalities?
- How do LLMs differ from humans in their grounding mechanisms?
- Why can't static grounding alone close the gap between agreement and understanding?
- Why do reasoning models fail on structurally unfamiliar instances?
- What mechanism causes confident false answers under high cognitive load?
- How does semantic grounding differ between human minds and language models?
- Why do language models substitute parametric knowledge over retrieved context mid-reasoning?
- Why do reasoning models perform poorly at theory of mind tasks?
- Why does static grounding prevent AI systems from supporting dialectical reconciliation?
- Why do LLMs produce semantically acceptable but pragmatically disengaged responses?
- Can decreased engagement be distinguished from genuine semantic contradiction?
- Why is hallucination the wrong term for all LLM false outputs?
- Can verifier-guided search catch factual errors that reasoning training cannot?
- Can LLMs explain concepts correctly while failing to use them?
- What causes LLMs to ignore unstated constraints they know about?
- What makes a claim socially valid even if factually imprecise?
- Why does semantic decoupling specifically break LLM reasoning abilities?
- Do LLMs compute scalar implicature differently across conversational contexts?
- How does implicit meaning processing limit LLM pragmatic reasoning?
- Why does hypothesis attestation bias exist separately from frequency bias in NLI?
- What makes factual verification difficult in inter-model debate?
- Why do language models naturally under-abstain instead of over-abstain?
- Why does entity recognition act as a self-knowledge mechanism in LLMs?
- Why do language models fail at grounding and inference?
- What reveals the epistemic limits of language models?
- Why are false presuppositions more persuasive than false assertions?
- Do language models show the same truth bias as humans?
- How does the symbol grounding problem apply to artificial language systems?
- What role does failure and vulnerability play in real linguistic practice?
- Do LLMs struggle more with semantic accuracy than syntactic correctness across domains?
- Why do language models presume common ground instead of establishing it?
- Why do LLMs fail at implicit elements in literary and poetic text?
- Why do entities trigger memorized propositions instead of enabling reasoning?
- Why do LLMs fail to actively reject false presuppositions in conversation?
- Can fact-checking systems use LLMs reliably if models abandon correct positions under pressure?
- How do embedding contexts like presupposition triggers affect LLM entailment reasoning?
- Do language models actively adopt false beliefs under sustained conversational pressure?
- How does truth bias in humans compare to face-saving in LLMs?
- Can preference optimization training make models worse at detecting false presuppositions?
- Why do LLMs fail at semantic generalization despite grammatical accuracy?
- Does social grounding differ fundamentally from causal grounding in LLM behavior?
- Why do language models presume common ground rather than build it?
- Why do true and false LLM outputs use the same mechanism?
- Why do language models hallucinate even with perfect training?
- Why can LLMs identify argument structure but not check warrants?
- Why do LLMs fail when asked to use counter-commonsense rules explicitly?
- Why do LLMs struggle with negation and exception handling?
- How does fine-tuning on natural language inference affect fallacy susceptibility?
- Why can't LLMs reason from first principles or initial commitments?
- Why do LLMs explain evidence accurately while missing its implications?
- Why do LLMs presume common ground instead of building it carefully?
- How does face-saving avoidance drive LLM grounding failures?
- Can training procedures fix LLM accommodation of false presuppositions?
- How much does question framing affect LLM accuracy on knowledge tasks?
- Can LLMs learn to ask clarifying questions instead of guessing?
- Why do LLMs perform better on explicit discourse connectives than implicit relations?
- What specific linguistic features cause LLMs to fail at trivial entailment?
- Why does preference optimization reduce grounding behavior in language models?
- How do LLMs handle false presuppositions embedded in user questions?
- Can language models correct false assumptions or only reinforce them?
- Why do explicit discourse connectives work when implicit relations fail?
- Why do LLMs struggle to update beliefs across multiple conversation turns?
- Can models detect false presuppositions when they actually possess the knowledge?
- Why are false presuppositions harder to spot when they sound plausible?
- How does shared reference and grounding affect assumption detection in dialogue?
- What makes correcting a false assumption harder than just detecting it?
- How do presuppositions exploit the logos-pathos space in explanations?
- Why do models maintain accurate beliefs but generate false claims?
- How do partial truths and weasel words differ as deception strategies?
- How can a model explain something correctly yet fail to apply it?
- Why are truthfulness and honesty mechanistically separate in language models?
- Do reasoning models overthink ill-posed questions instead of recognizing incompleteness?
- Why do LLMs presume common ground instead of building it?
- Why do NLP models fail at recognizing multiple valid interpretations?
- How do human annotators disagree systematically on ambiguous examples?
- Why do language models struggle with context-dependent pragmatic interpretation?
- Can LLMs build shared understanding through dynamic grounding rather than presuming it?
- Does adding multiple interpretations to ambiguous situations respect language more than resolving them?
- Why does reflection in reasoning models stay confirmatory instead of corrective?
- Why do language models prefer accommodating false information over rejecting it?
- Can behavioral self-awareness in LLMs extend to recognizing their own contradictions?
- Why does false information spread faster when presupposed rather than asserted?
- Can LLMs compute how presuppositions project through embedded clauses?
- How does the Question Under Discussion shape what counts as presupposed?
- Can presupposition projection strength vary by context in embeddings?
- Why do non-factive verbs and triggers both fool language models?
- Why do language models treat presupposition triggers as categorical patterns?
- Can the same predicate generate different projection strength in different contexts?
- Can models distinguish between activated knowledge and genuine reasoning?
- Why do relational states like speech-acts resist quasi-interpretive treatment?
- How do structured prompts force LLMs to check for contradictions in evidence?
- Why do reasoning models confidently generate wrong answers instead of abstaining?
- Do base models and reasoning models fail in opposite directions on uncertainty?
- Why do models hallucinate when retrieval heads fail despite having information in context?
- Why do reasoning models amplify confidence in incorrect answers during self-revision?
- Why does reflection in reasoning models tend to be confirmatory rather than corrective?
- How do conversation dynamics push models toward false beliefs?
- What cognitive structures do realistic belief models need to include?
- Can functional semantic grounding substitute for true causal grounding?
- How does Wittgenstein's language games explain social grounding in LLMs?
- Why do models overthink underspecified problems instead of rejecting them?
- How does preference optimization weaken conversational grounding in LLMs?
- Why is false punditry essentially static grounding applied to public commentary?
- Why do reasoning-optimized models still fall for logical fallacies in conversation?
- Why do language models struggle with evaluative tasks like weighing competing viewpoints?
- Why does monological training prevent models from overriding statistical priors?
- How does the LLM Fallacy prevent users from noticing cognitive debt accumulating?
- Why do experts experiencing the LLM Fallacy fail to develop custodian skills?
- What makes grounding acts essential to conversational reliability?
- How do LLMs reproduce the grammar of authoritative claims without genuine conviction?
- How does preference optimization reduce LLM grounding and clarification behavior?
- What distinguishes static grounding that presumes understanding from dynamic grounding that builds it?
- Do language models behave differently on contested beliefs versus factual claims?
- Why do models detect false assumptions but still fail to correct them appropriately?
- Why does sophisticated measurement not validate the underlying scientific inference?
- How do structured benchmarks hide theory of mind failures in LLMs?
- Why do language models presume common ground instead of building it?
- Why do familiar patterns that support correct answers sometimes drive errors?
- Why do safety-trained models refuse questions they could actually answer well?
- Does attention bias explain grounding failure in language models?
- Why do language models produce unfaithful chain of thought explanations?
- Is the distinction between pretense and realization meaningful for LLMs?
- Why do LLMs fail at counterfactual reasoning despite factual knowledge?
- Why do LLMs explain correct reasoning but then choose greedy actions?
- Why do LLMs choose incorrect edits despite understanding the task?
- Can models learn to ask clarifying questions instead of making assumptions?
- What implicit premises do language models skip even with correct surface reasoning?
- Can reasoning models reject ill-posed questions or do they overthink?
- How do prior errors in reasoning context amplify future mistakes?
- Why do unit-sphere spaces fail at distinguishing word order and negation?
- Why do LLMs fail at faithful autoformalisation of reasoning problems?
- How does completion bias in agents differ from other epistemic failure modes?
- Do readers with weakly held priors respond more to linguistic features than ideologically committed ones?
- Do reasoning models need to verbalize doubt to correct their own mistakes?
- What makes an argument fallacious according to formal linguistic criteria?
- Can irrelevant information reliably expose the limits of LLM reasoning?
- Can LLMs express uncertainty in ways that preserve epistemic honesty?
- What structural framework prevents LLM explanations from becoming just plausible fiction?
- Why do low-knowledge personas reduce LLM accuracy on hard questions?
- Can models be honest without being truthful about facts?
- Does premature confidence signal flawed reasoning in language models?
- Why does LLM fluency create false perceptions of professional standing and expertise?
Related concepts in this collection 4
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Why do language models avoid correcting false user claims?
Explores whether LLM grounding failures stem from missing knowledge or from conversational dynamics. Examines whether models use face-saving strategies similar to humans when disagreement is needed.
the mechanism behind this failure: models avoid disagreement even when correct
-
Do language models actually build shared understanding in conversation?
When LLMs respond fluently to prompts, do they perform the communicative work humans do to establish mutual understanding? Research suggests they skip the grounding acts that make dialogue reliable.
this is the active form: not just presuming but actively accommodating false common ground
-
Does preference optimization damage conversational grounding in large language models?
Exploring whether RLHF and preference optimization actively reduce the communicative acts—clarifications, acknowledgments, confirmations—that build shared understanding in dialogue. This matters for high-stakes applications like medical and emotional support.
RLHF reinforces the accommodation behavior through training signal
-
Why do language models struggle with questions containing false assumptions?
Do LLMs reliably detect and reject questions built on false premises? The (QA)2 benchmark tests this directly, measuring whether models can identify problematic assumptions embedded in naturally plausible questions.
quantifies the QA performance drop from false assumptions
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Can LLMs Ground when they (Don't) Know: A Study on Direct and Loaded Political Questions
- LLMs Struggle to Reject False Presuppositions when Misinformation Stakes are High
- Explicit Inductive Inference using Large Language Models
- Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey
- Linguistic Calibration of Long-Form Generations
- The Model Says Walk: How Surface Heuristics Override Implicit Constraints in LLM Reasoning
- Neutralizing Bias in LLM Reasoning using Entailment Graphs
- Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds
Original note title
llms fail to reject false presuppositions even when knowledge is present