INQUIRING LINE

Inquiring lines›What makes reasoning better — more…›How do prompts and framing affect…›How faithfully do LLMs reflect the…›this inquiring line

An AI can state a rule, break that rule, notice the mistake — and still not fix it.

Why can't LLMs reason from first principles or initial commitments?

This explores why LLMs struggle to hold a starting rule or premise and follow it through — reasoning from a fixed commitment rather than drifting toward whatever their training data finds familiar.

This reads the question as asking why models can't lock onto an initial principle and reason forward from it consistently. The corpus has a surprisingly clear answer, and it's not what you'd expect: the problem usually isn't that models lack the principle. It's that knowing a principle and executing on it run on separate tracks. Several notes describe a 'split-brain' pattern where models articulate a correct rule and then fail to apply it — 87% accuracy in explanation versus 64% in action Can language models understand without actually executing correctly?. The related 'Potemkin understanding' work sharpens this: a model can explain a concept, fail to use it, and even recognize its own failure — a triple combination no human reasoner would produce Can LLMs understand concepts they cannot apply?. So a stated first principle isn't a binding commitment the way it is for a person; it's just more text the model has generated.

Why doesn't the commitment bind? Because the underlying reasoning is semantic association, not symbolic logic. When researchers decouple meaning from the task — keeping the rules valid but stripping the familiar content — model performance collapses even with the correct rule sitting right there in the prompt Do large language models reason symbolically or semantically?. The model is leaning on what 'sounds right' from its training distribution rather than mechanically following the premise it was given. This is also why models accommodate false starting assumptions: the FLEX benchmark shows them accepting false presuppositions they demonstrably know are wrong, because the fluent continuation pulls harder than the stored fact Why do language models accept false assumptions they know are wrong?.

Reasoning from first principles also requires bringing forward what's unstated — the background conditions a premise quietly depends on. Here the corpus points to a revived version of the classic 'frame problem': models fail not from missing knowledge but from not enumerating the relevant preconditions. Force that enumeration explicitly and accuracy jumps from 30% to 85% Do language models fail at identifying unstated preconditions?. And even when the first step is sound, the chain wanders. Reasoning models behave like unsystematic explorers rather than methodical searchers, so success probability decays exponentially as a problem gets deeper — fine for shallow problems, catastrophic for long derivations Why do reasoning LLMs fail at deeper problem solving?. A first-principles argument is exactly the deep, many-step kind that this failure mode punishes hardest.

The most useful turn here is what the corpus says fixes it — because the fixes reveal the cause. Nearly every remedy works by supplying the structure the model won't impose on itself. Forcing models to check their warrants and backing with explicit critical-question prompts catches failures that ordinary chain-of-thought hides Can structured argument prompts make LLM reasoning more rigorous?. Offloading the actual inference to a symbolic solver, leaving the LLM only to translate, produces faithful logic with machine-checkable error messages Can symbolic solvers fix how LLMs reason about logic?. And partial formalization — enriching natural language with selective symbolic scaffolding rather than fully formalizing — beats both pure prose and full logic Why does partial formalization outperform full symbolic logic?. The through-line worth taking away: LLMs can't reliably reason from a commitment because nothing internal holds the commitment in place. These all sit inside a broader map of distinct epistemic failure modes How do LLMs fail to know what they seem to understand? — and the practical upshot is that 'reason from first principles' isn't one capability the model is missing, but a stack of small disciplines (hold the premise, surface the hidden conditions, follow each step, don't drift to the familiar) that the architecture won't enforce unless you build the enforcement around it.

Sources 10 notes

Can language models understand without actually executing correctly?

Large language models can articulate correct principles but systematically fail to apply them due to dissociated instruction and execution pathways. The 87% accuracy in explanations versus 64% in actions reveals this is not knowledge deficit but structural disconnect.

Can LLMs understand concepts they cannot apply?

Models can explain concepts accurately, fail to apply them, and recognize the failure—a triple pattern incompatible with human cognition. This indicates functionally disconnected explanation and execution pathways rather than simple knowledge gaps.

Do large language models reason symbolically or semantically?

When semantic content is decoupled from reasoning tasks, LLM performance collapses even with correct rules in context. Models rely on parametric commonsense and token associations rather than formal logical manipulation, constraining reasoning to training distribution semantics.

Why do language models accept false assumptions they know are wrong?

The FLEX Benchmark shows that models reject false presuppositions at rates far below acceptable levels (GPT-4: 84%, Mistral: 2.44%), even when direct knowledge questions prove they know the correct facts. False presuppositions drive more accommodation than correct knowledge drives rejection.

Do language models fail at identifying unstated preconditions?

LLMs struggle not from lacking world knowledge but from failing to bring background conditions forward as relevant constraints. Prompting that forces explicit enumeration of preconditions raises accuracy from 30% to 85%, revealing the frame problem persists in statistical systems.

Show all 10 sources

Why do reasoning LLMs fail at deeper problem solving?

Current reasoning models lack the three properties of systematic exploration: validity, effectiveness, and necessity. This causes success probability to drop exponentially with problem depth, making medium problems solvable but deep problems catastrophically harder.

Can structured argument prompts make LLM reasoning more rigorous?

Applying Toulmin's argument model as explicit prompting steps (CQoT) improves LLM reasoning by forcing models to identify warrants and backing rather than skipping implicit premises. The method catches failures that standard chain-of-thought prompting allows.

Can symbolic solvers fix how LLMs reason about logic?

Logic-LM divides cognitive labor by having LLMs formulate symbolic representations while deterministic solvers execute inference and provide machine-verifiable error messages. This structured feedback loop catches translation errors better than LLM self-critique, improving faithful reasoning without requiring perfect formalization.

Why does partial formalization outperform full symbolic logic?

QuaSAR and Logic-of-Thought both achieve 4-8% accuracy gains by enriching natural language with selective symbolic elements rather than replacing it. Full formalization loses semantic information; pure language lacks structure. Augmentation preserves both.

How do LLMs fail to know what they seem to understand?

LLMs show repeatable, empirically documented failure modes—from Potemkin understanding (correct explanation + failed application) to reasoning collapse under implicit constraints. These failures reveal gaps between statistical pattern-tracking and actual epistemic competence.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst evaluating whether LLMs can commit to and reason from explicit first principles. The question remains open: what architectural or training-level barriers prevent models from locking onto a stated premise and following it through consistently?

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–2026; treat them as snapshots, not current state.
• Models exhibit a 'split-brain' failure: 87% accuracy in *explaining* a rule vs. 64% in *applying* it, suggesting knowing and doing run on separate tracks (2025).
• LLMs reason by semantic association (what 'sounds right' from training) rather than symbolic binding; stripping familiar content while keeping valid rules causes performance to collapse (2023).
• Models accept false presuppositions they demonstrably know are wrong, because fluent continuation overpowers stored fact (2024).
• The 'frame problem' manifests as failure to enumerate unstated preconditions; forcing explicit enumeration raises accuracy from 30% to 85% (2025).
• Reasoning models behave as unsystematic explorers; success probability decays exponentially with chain depth, catastrophic for multi-step derivations (2025).

Anchor papers (verify; mind their dates):
• arXiv:2305.14825 (2023) — semantic vs. symbolic reasoning split
• arXiv:2412.15177 (2024) — critical-question prompts as structural enforcement
• arXiv:2507.10624 (2025) — comprehension-without-competence
• arXiv:2602.06176 (2026) — reasoning failures survey

Your task:
(1) RE-TEST EACH CONSTRAINT. For the 'split-brain', semantic-drift, false-presupposition, and frame-enumeration failures: have newer models (o1, o3, Claude 4, Grok-3, etc.), retrieval-augmented reasoning, longer-context windows, or hybrid symbolic+LLM systems since *relaxed* these? Which remain hard constraints? Cite the paper or system that moved the needle.
(2) Surface the strongest work from the last ~6 months that *contradicts* or *supersedes* the split-brain or semantic-association thesis — or that shows first-principles reasoning *is* now achievable under specific conditions.
(3) Propose 2 new research questions that assume the regime may have shifted: (a) Can finetuning or constitutional AI lock a model to a premise more reliably than prompting? (b) Do multi-agent verification loops, where one model audits another's adherence to stated rules, sidestep the internal binding problem?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

An AI can state a rule, break that rule, notice the mistake — and still not fix it.

Related lines of inquiry

Sources 10 notes

Papers this line draws on 8