Do language models fail at identifying unstated preconditions?
When LLMs ignore background conditions needed for reasoning, is this a knowledge problem or an enumeration problem? Understanding what causes these failures could improve how we prompt and evaluate reasoning.
The classical frame problem (McCarthy and Hayes, 1981) asks how a reasoning system decides which background conditions are relevant when reasoning about an action. Most things stay the same when an action is performed — the frame — and the system needs to know which non-trivial things change without being told. Solving this in symbolic AI required either explicit frame axioms (combinatorially expensive) or non-monotonic logics (mathematically delicate).
The Heuristic Override Benchmark identifies a contemporary version of the same problem in LLMs. When a user asks "should I walk or drive to the car wash 50m away," the relevant unstated condition is that the car must be at the car wash. This is a feasibility precondition that no human would need to state because it is presupposed by the entire setup. The model fails not because it lacks the world knowledge — it has it — but because it does not bring this background condition forward as relevant when the surface heuristic ("50m is walkable") is active.
This reframes the failure. It is not noise filtering (the standard shortcut-learning frame). It is not knowledge retrieval (the standard hallucination frame). It is enumeration: which of the indefinitely many things I know about the world should I treat as live constraints on this decision? Structured prompting that forces enumeration ("what must be true for walking to be feasible?") raises accuracy from around 30 percent to 85 percent on single instances. The intervention works precisely because it externalizes the enumeration step the model cannot reliably perform on its own.
The frame problem was once thought specific to symbolic systems. The HOB results suggest it persists, in different form, in statistical systems trained on language. The substrate changed; the structural problem did not.
Inquiring lines that use this note as a source 43
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Can LLMs propose pivots that change what counts as background context?
- Should LLMs query users back when presented with under-specified scenarios?
- What should we call errors in LLM outputs when hallucination does not apply?
- How do fixed pragmatic templates prevent models from understanding context?
- Can prompting techniques reliably force models to enumerate hidden constraints?
- What other latent LLM capabilities remain inactive without explicit activation cuing?
- How much of LLM reasoning failure stems from missing knowledge versus signal weighting?
- What makes a problem instance unfamiliar to a language model?
- Can structured prompting reliably force models to enumerate preconditions?
- How does the frame problem differ between symbolic and statistical reasoning systems?
- What makes a background condition relevant to a specific reasoning task?
- Can models identify what information they are missing in underspecified problems?
- Why do reasoning models fail on structurally unfamiliar instances?
- Can LLMs explain concepts correctly while failing to use them?
- What causes LLMs to ignore unstated constraints they know about?
- How do embedding contexts like presupposition triggers affect LLM entailment reasoning?
- What distinguishes entity errors from relation errors in LLM output?
- Why do NLP benchmarks hide LLM failures in ambiguity handling?
- Why can LLMs identify argument structure but not check warrants?
- Why do LLMs fail when asked to use counter-commonsense rules explicitly?
- Why do LLMs struggle with negation and exception handling?
- Why can't LLMs reason from first principles or initial commitments?
- Can prompt engineering and external knowledge bases fix ambiguity recognition failures?
- Why do LLMs choose surface-order quantifier scope over contextually correct readings?
- What specific linguistic features cause LLMs to fail at trivial entailment?
- Why do language models struggle with formal logical reasoning and joins?
- How does the Question Under Discussion shape what counts as presupposed?
- How do structured prompts force LLMs to check for contradictions in evidence?
- Does LLM reasoning always match the outputs it generates?
- Why do LLMs struggle to translate natural language into logical formalizations?
- Why do reasoning models fail at learning hidden rules from sparse exceptions?
- Why do LLMs fail at counterfactual reasoning despite factual knowledge?
- What concrete problems do LLMs solve at the computational level?
- What failure modes does the negative-space checklist generation method actually catch?
- Why do language models fail at understanding ambiguous or complex requirements?
- Why do LLMs strip applicability conditions during memory abstraction?
- Can partial formal verification work without full formalization of language semantics?
- Can we use LLM language without adopting LLM assumptions?
- How do LLMs lose information when translating natural language to formal logic?
- Why do LLMs fail at faithful autoformalisation of reasoning problems?
- What semantic information is necessary to preserve for sound LLM reasoning?
- Can irrelevant information reliably expose the limits of LLM reasoning?
- Why do LLMs reason fluently about causality but lack causal rigor?
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey
- Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds
- QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks?
- Probing Structured Semantics Understanding and Generation of Language Models via Question Answering
- Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models
- Large Language Model Reasoning Failures
- Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs
- Can Large Language Models Reason and Optimize Under Constraints?
Original note title
The modern frame problem manifests as enumeration failure of unstated preconditions not noise filtering