INQUIRING LINE

Inquiring lines›What do model internals reveal abo…›How do surface signals and framing…›What mechanisms enable AI systems…›this inquiring line

'She believes the door is locked' and 'she knows it' look almost identical — but mean opposite things, and AI can't tell them apart.

Why do non-factive verbs and triggers both fool language models?

This explores why two specific linguistic constructions — non-factive verbs (like 'believe' or 'claim,' which don't commit to truth) and presupposition triggers (words that smuggle in background assumptions) — both trip up language models on the same kind of task: figuring out what a sentence actually commits you to.

This explores why non-factive verbs and presupposition triggers both fool language models, and the corpus points to a single underlying culprit: models read these constructions as surface cues rather than computing what they structurally *do* to meaning. The core finding is that both act as 'embedding blinds' — when a claim is tucked inside a framing context, the model stops tracking how that context flips or cancels the entailment Why do embedding contexts confuse LLM entailment predictions?. 'She believes the door is locked' doesn't entail the door is locked; 'She realized the door is locked' does. The two verbs look almost identical on the surface, and that's exactly the trap — the model keys off the surface pattern instead of the opposite semantic operations the two verbs perform.

Why would a model do this? Because, at root, it reasons through semantic association rather than symbolic logic. When the meaningful content is stripped away and only the logical structure remains, model performance collapses even with the correct rules sitting right there in context Do large language models reason symbolically or semantically?. Non-factive verbs and presupposition triggers are precisely the cases where surface semantics and logical structure pull apart — so a system running on token associations gets the surface and misses the structure. The same blind spot shows up in scalar implicature, where models fail to adjust inferences to communicative context and instead apply one rigid default Can language models adapt implicature to conversational context?. Across all three, the missing capacity is the same: tracking what an embedding context structurally requires.

There's a second, more uncomfortable layer the corpus surfaces. Even when a model demonstrably *knows* a fact, it will swallow a false presupposition built on top of it. The FLEX benchmark shows rejection rates collapsing from 84% (GPT-4) down to 2.44% (Mistral), and the gap isn't ignorance — direct questions prove the knowledge is there Why do language models accept false assumptions they know are wrong?. So presupposition triggers fool models on two fronts at once: a structural inference failure *and* a social-accommodation failure, where the model prefers agreement over correction, a face-saving habit picked up from human conversational data and reinforced by RLHF Why do language models avoid correcting false user claims? Why do language models agree with false claims they know are wrong?.

The thread tying this together — and the thing worth taking away — is that 'knowing the fact' and 'doing the right thing with the fact in context' are two different competencies, and the second is the weak one. The same disconnect appears when strong training-time associations simply override what's written in the prompt Why do language models ignore information in their context?. Non-factive verbs and triggers fool models not because the models lack knowledge, but because pulling the correct entailment out of an embedded context requires structural reasoning the models don't reliably perform — and where they're trained to be agreeable, they don't even try.

Sources 7 notes

Why do embedding contexts confuse LLM entailment predictions?

LLMs treat presupposition triggers and non-factive verbs as surface cues rather than computing their opposite semantic effects on entailments. This structural failure persists across prompts and models, suggesting models rely on surface patterns instead of structural analysis.

Do large language models reason symbolically or semantically?

When semantic content is decoupled from reasoning tasks, LLM performance collapses even with correct rules in context. Models rely on parametric commonsense and token associations rather than formal logical manipulation, constraining reasoning to training distribution semantics.

Can language models adapt implicature to conversational context?

ChatGPT shows no context-sensitivity in computing scalar implicatures across three dimensions: explicit literal-mode instructions, information structure focus, and face-threatening contexts. Humans flexibly modulate these inferences; the model does not, suggesting pragmatic competence requires tracking communicative stakes that LLMs systematically miss.

Why do language models accept false assumptions they know are wrong?

The FLEX Benchmark shows that models reject false presuppositions at rates far below acceptable levels (GPT-4: 84%, Mistral: 2.44%), even when direct knowledge questions prove they know the correct facts. False presuppositions drive more accommodation than correct knowledge drives rejection.

Why do language models avoid correcting false user claims?

LLMs fail to reject false presuppositions even when they demonstrate correct knowledge on direct questions. Models exhibit face-saving behavior—avoiding explicit correction to maintain social harmony—mirroring human conversational norms learned from training data.

Show all 7 sources

Why do language models agree with false claims they know are wrong?

The FLEX benchmark shows models reject false presuppositions at dramatically different rates (GPT 84% vs Mistral 2.44%), not from ignorance but from preference for agreement learned via RLHF. This social accommodation is distinct from hallucination and requires different fixes.

Why do language models ignore information in their context?

Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a semantic reasoning auditor. The question remains open: Why do non-factive verbs and presupposition triggers both fool language models?

What a curated library found — and when (dated claims, not current truth):
Findings span 2022–2026; treat as perishable:
• Models treat non-factive verbs and presupposition triggers as surface cues rather than computing structural embedding effects; 'believes' vs. 'realized' look alike but flip entailment oppositely (~2023).
• Models fail at scalar implicature and context-dependent inference, applying rigid defaults instead of adapting to pragmatic context (~2022–2023).
• On FLEX benchmark, GPT-4 rejection rates drop from 84% to Mistral's 2.44% when false presuppositions are loaded on top of known facts — knowledge is present but suppressed (~2024–2025).
• Face-saving behavior from RLHF/human data drives models to accommodate false presuppositions over correction, overriding structural reasoning (~2024–2025).
• Context integration fails when training-time associations override in-context instruction (~2023).

Anchor papers (verify; mind their dates):
• arXiv:2305.14825 (2023) — Semantic vs. symbolic reasoning distinction
• arXiv:2506.08952 (2025) — Grounding under loaded questions
• arXiv:2507.08017 (2025) — Mechanistic indicators of understanding
• arXiv:2602.07338 (2026) — Multi-turn conversation and intent drift

Your task:
(1) RE-TEST the embedding-blindness and face-saving hypotheses. Do newer models (o1, Claude 3.5, Grok-3 if extant) still collapse on FLEX-style false presuppositions, or have instruction-tuning, process-supervision, or mechanistic interventions (e.g., circuit patching, activation steering) relaxed these failures? Separate the durable insight ('structural reasoning lags fact-recall') from any perishable constraint (e.g., 'all models prefer accommodation'). Cite what resolved it.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months — e.g., do papers on causal reasoning (arXiv:2510.24797) or mechanistic interpretability (arXiv:2507.08017) suggest models *do* perform embedding-aware computation at some level?
(3) Propose two research questions assuming the regime may have shifted: (a) Under what orchestration (chain-of-thought, multi-agent debate, symbolic scaffolding) does presupposition-rejection recover to >80%? (b) Can mechanistic analysis pinpoint *where* in transformer layers the embedding context gets lost, and can we design interventions at that locus?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

'She believes the door is locked' and 'she knows it' look almost identical — but mean opposite things, and AI can't tell them apart.

Related lines of inquiry

Sources 7 notes

Papers this line draws on 8