What semantic information is necessary to preserve for sound LLM reasoning?
This explores what kinds of meaning an LLM has to keep intact to reason reliably — and the corpus answers mostly by showing what breaks when that meaning is stripped, left unstated, or mistranslated.
This explores what kinds of meaning an LLM has to keep intact to reason reliably. The corpus suggests the honest answer is uncomfortable: for these models, *almost all of it* — because they reason through meaning, not through form. When semantic content is decoupled from a reasoning task — the same logical structure dressed in nonsense tokens — performance collapses even when the correct rules sit right there in the prompt Do large language models reason symbolically or semantically?. So the first thing to preserve is the very thing symbolic systems throw away: the model leans on token associations and learned commonsense, not on a rule it can apply blind to content.
That dependence shows up as a set of specific things models fail to hold onto. They drop unstated preconditions — the background conditions a task quietly assumes — and forcing explicit enumeration of those conditions jumps accuracy from 30% to 85%, a modern version of the old frame problem surfacing inside a statistical system Do language models fail at identifying unstated preconditions?. They also lose track of which proposition is doing the logical work: instead of checking whether a premise supports a hypothesis, models predict entailment based on whether the hypothesis looks *attested* — familiar from training — and keep saying "entailed" even when the premise is randomized Do LLMs predict entailment based on what they memorized?. The relationship between premise and conclusion is exactly the semantic information that gets discarded.
The most striking cases are operators that *flip* meaning. Presupposition triggers and non-factive verbs ("believes," "pretended," "failed to") change what a sentence entails, and models treat them as surface cues rather than computing their actual effect — a structural blind spot that survives across prompts and models Why do embedding contexts confuse LLM entailment predictions?. The same gap appears when LLMs translate natural language into formal logic: they produce well-formed expressions that are semantically *wrong*, with errors clustering exactly where meaning is delicate — quantifier scope, predicate granularity, what-modifies-what Can large language models translate natural language to logic faithfully?. Sound reasoning needs scope, polarity, and quantifier precision preserved; these are the first casualties.
What makes this hard to fix from the inside is that the meaning a model holds isn't stored in one clean place. Mechanistic work finds understanding layered in tiers — features-as-directions, factual world-state, compact circuits — but higher tiers coexist with lower-tier heuristics instead of replacing them, so a model can get the right answer while leaning on the wrong representation Do language models understand in fundamentally different ways?, and internal structure can stay decoupled from external performance entirely What actually happens inside the minds of language models?. Worse, models reconstruct meaning you never stated — piecing scattered hints across training into inferences no single document contained Can LLMs reconstruct censored knowledge from scattered training hints? — so "preserve the right semantics" isn't just about the prompt; it's about a distribution you don't fully control.
The constructive responses in the corpus all push meaning to a more durable level rather than trusting token-by-token flow. Cognitive tools isolate each reasoning operation in its own sandboxed call so semantics can't leak between steps, lifting GPT-4.1 on AIME from 26.7% to 43.3% with no extra training Can modular cognitive tools unlock reasoning without training?; Large Concept Models reason over whole-sentence embeddings in a language-agnostic space, preserving propositional meaning above the token Can reasoning happen at the sentence level instead of tokens?; and retrieval research finds external knowledge only helps when retrieval and reasoning are tightly coupled rather than bolted together How should systems retrieve and reason with external knowledge?. The thing none of them can promise is elimination of error — hallucination is formally inevitable for any computable LLM, which is why these are all about *external scaffolding* for meaning rather than an internal fix Can any computable LLM truly avoid hallucinating?. The quiet lesson: the semantics most necessary to preserve are precisely the ones models are most prone to flatten — unstated preconditions, premise-to-conclusion links, and meaning-flipping operators.
Sources 12 notes
When semantic content is decoupled from reasoning tasks, LLM performance collapses even with correct rules in context. Models rely on parametric commonsense and token associations rather than formal logical manipulation, constraining reasoning to training distribution semantics.
LLMs struggle not from lacking world knowledge but from failing to bring background conditions forward as relevant constraints. Prompting that forces explicit enumeration of preconditions raises accuracy from 30% to 85%, revealing the frame problem persists in statistical systems.
McKenna et al. (2023) identified attestation bias: LLMs predict entailment based on whether the hypothesis appears in training data, not whether the premise actually supports it. Random premise experiments show models maintain high entailment predictions when hypotheses are attested, proving they respond to memorized propositions rather than premise-hypothesis relationships.
LLMs treat presupposition triggers and non-factive verbs as surface cues rather than computing their opposite semantic effects on entailments. This structural failure persists across prompts and models, suggesting models rely on surface patterns instead of structural analysis.
LLMs generate well-formed logical expressions that are semantically incorrect, with errors clustering at scope ambiguity, quantifier precision, and predicate granularity. The asymmetry suggests LLMs understand formal language better than they can generate it.
Mechanistic interpretability reveals conceptual understanding (features as directions), state-of-world understanding (factual connections), and principled understanding (compact circuits). Crucially, higher tiers coexist with lower-tier heuristics rather than replacing them, creating a patchwork of capabilities.
LLMs can achieve identical accuracy while maintaining radically different internal representations, and mechanisms that appear interpretable may not causally drive outputs. This decoupling means performance metrics alone mask crucial differences in how models actually work.
Language models perform out-of-context reasoning across the full training distribution, reconstructing information never explicitly stated in any single document. Experiments show models can infer city identities from scattered distance relationships and apply them downstream without in-context learning.
Four cognitive tools implemented as sandboxed LLM calls improved GPT-4.1 on AIME2024 from 26.7% to 43.3% without any RL training. Modularity enforces operation isolation that pure prompting cannot guarantee, eliciting pre-existing reasoning capability.
Meta's Large Concept Model operates on sentence embeddings rather than tokens, reasoning in a language-agnostic space before decoding to any target language. This hierarchical approach with paragraph-level planning produces more coherent output than flat token generation.
Research shows retrieval should adapt dynamically rather than follow fixed patterns, reasoning and retrieval must integrate closely, and embedding-based retrieval has fundamental limits requiring architectural alternatives.
Three formal theorems prove that any computable LLM must hallucinate on infinitely many inputs, and internal mechanisms like self-correction cannot eliminate this mathematical constraint. External safeguards are therefore necessary, not optional.