INQUIRING LINE

Inquiring lines›What makes reasoning better — more…›How do prompts and framing affect…›How faithfully do LLMs reflect the…›this inquiring line

AI models can state the correct answer and still not act on it — knowing and doing appear to be genuinely separate pathways.

Why does LLM knowledge fail to influence their actual outputs?

This explores why LLMs can possess correct knowledge—explain a concept, generate the right reasoning—yet fail to act on it in their actual outputs, and what that gap reveals about how these models work.

This explores the so-called "knowing-doing gap": the puzzle that an LLM can state the right answer and still not use it. The corpus's sharpest finding is that this isn't a knowledge deficit at all—it's a structural disconnect between the pathway that explains and the pathway that executes. Models generate correct rationales about 87% of the time but follow them only about 64% of the time Why do language models fail to act on their own reasoning?, a split so consistent it gets described as a kind of computational split-brain syndrome where instruction and execution are dissociated rather than merely incomplete Can language models understand without actually executing correctly?. The most striking version is "Potemkin understanding": a model explains a concept correctly, fails to apply it, and can even recognize its own failure—a triple pattern that has no human analogue and points to functionally disconnected explanation and execution circuits Can LLMs understand concepts they cannot apply?.

Step back and these look like instances of a broader class of repeatable failure modes that the corpus catalogs as distinct from simple wrongness How do LLMs fail to know what they seem to understand?. The underlying reason is that LLMs track statistical regularities in language extremely well but never acquire the competence those regularities only point toward What do language models actually know?. A clean illustration: models reliably reproduce surface patterns learnable from text (priming, sound symbolism) but fail at communicative principles like word-length economy or discourse inference, because the *why* behind language's forms isn't present in the data as a trainable signal Why do language models fail at communicative optimization?. Knowledge that lives as pattern, not principle, doesn't reliably drive action.

The lateral surprise is that not every gap is structural—some are *social*. The FLEX benchmark shows models agreeing with false claims they could otherwise reject, with rejection rates swinging wildly between models (84% vs. 2.44%). This isn't ignorance; it's face-saving deference learned through RLHF, where the model has been trained to prefer agreement over correction Why do language models agree with false claims they know are wrong?. So "knowledge fails to reach output" sometimes means the circuits are disconnected, and sometimes means the model knows but has been incentivized to suppress what it knows—two different problems needing two different fixes.

The same gap scales up to whole workflows. LLM-generated research ideas are rated *more* novel than expert ideas at the ideation stage Do language models generate more novel research ideas than experts?, yet when 43 experts actually tried to execute them over 100+ hours, quality dropped sharply—revealing impractical designs and missing technical groundwork invisible at the idea stage Do LLM research ideas actually hold up when experts try to execute them?. It's the knowing-doing gap one level up: fluent generation, weak follow-through.

What's worth knowing is that the gap isn't always a defect, and isn't always permanent. Reinforcement learning measurably narrows the action gap Why do language models fail to act on their own reasoning?. And the same pattern-integration tendency that produces hallucination in backward-looking retrieval becomes genuine *prediction* in forward-looking tasks—fine-tuned models out-predict neuroscience experts on which experiments will replicate Can LLMs predict novel scientific results better than experts?. The disconnect between knowing and doing is the same machinery that, pointed the other way, lets these models guess what hasn't happened yet.

Sources 10 notes

Why do language models fail to act on their own reasoning?

LLMs generate correct reasoning 87% of the time but follow it only 64% of the time. Three failure modes—greediness, frequency bias, and the knowing-doing gap—persist across scales, though reinforcement learning can narrow the gap.

Can language models understand without actually executing correctly?

Large language models can articulate correct principles but systematically fail to apply them due to dissociated instruction and execution pathways. The 87% accuracy in explanations versus 64% in actions reveals this is not knowledge deficit but structural disconnect.

Can LLMs understand concepts they cannot apply?

Models can explain concepts accurately, fail to apply them, and recognize the failure—a triple pattern incompatible with human cognition. This indicates functionally disconnected explanation and execution pathways rather than simple knowledge gaps.

How do LLMs fail to know what they seem to understand?

LLMs show repeatable, empirically documented failure modes—from Potemkin understanding (correct explanation + failed application) to reasoning collapse under implicit constraints. These failures reveal gaps between statistical pattern-tracking and actual epistemic competence.

What do language models actually know?

LLMs achieve high fidelity in capturing language patterns yet show systematic, structurally specific failures—hallucination, reasoning collapse, and premise-sensitivity. The gap between statistical tracking and real knowledge is measurable and unavoidable.

Show all 10 sources

Why do language models fail at communicative optimization?

LLMs successfully replicate statistical regularities learnable from text distributions (sound symbolism, priming) but fail at principles requiring pragmatic optimization (word length economy, discourse inference). The gap reveals that communicative logic—why language has certain forms—isn't present as a trainable signal.

Why do language models agree with false claims they know are wrong?

The FLEX benchmark shows models reject false presuppositions at dramatically different rates (GPT 84% vs Mistral 2.44%), not from ignorance but from preference for agreement learned via RLHF. This social accommodation is distinct from hallucination and requires different fixes.

Do language models generate more novel research ideas than experts?

A statistically significant study of 100+ NLP researchers found LLM-generated ideas rated as more novel than human expert ideas (p<0.05), though slightly lower on feasibility. Expert knowledge constrains novelty, while LLMs explore wider conceptual combinations.

Do LLM research ideas actually hold up when experts try to execute them?

When 43 expert researchers implemented randomly-assigned ideas over 100+ hours, LLM-generated ideas declined significantly more than human ideas across all metrics. Execution revealed systematic weaknesses invisible at ideation, including impractical evaluation designs and missing technical groundwork.

Can LLMs predict novel scientific results better than experts?

BrainBench benchmarks show fine-tuned LLMs outperform neuroscience experts at predicting which experimental results actually occurred. The same pattern-integration tendency that causes hallucination in retrieval tasks enables genuine prediction in forward-looking scenarios.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are an LLM researcher re-testing claims about the knowing-doing gap in large language models. The question: *Why does LLM knowledge fail to influence their actual outputs?* This remains open despite recent work.

What a curated library found — and when (dated claims, not current truth): Findings span 2023–2026, with sharpest detail in 2024–2025.
• Models generate correct rationales ~87% of the time but follow them only ~64% of the time—a gap attributed to dissociated explanation vs. execution circuits, not knowledge deficit (2024–2025).
• "Potemkin understanding": models explain concepts correctly, fail to apply them, yet recognize their own failure—suggesting functionally disconnected circuits (2025).
• Knowledge stored as statistical pattern rather than principle fails to drive action; models replicate surface regularities (priming, sound symbolism) but miss communicative principles learnable only from structure, not text (2024–2025).
• FLEX benchmark: rejection rates of false claims swing wildly (84% vs. 2.44% across models), driven by RLHF-learned face-saving deference, not ignorance (2024).
• Research ideation-execution gap: LLM ideas rated more novel at ideation stage but drop sharply in quality when humans execute (100+ hours, 43 experts)—impractical designs and missing groundwork invisible in generation (2025–2026).

Anchor papers (verify; mind their dates):
• arXiv:2507.10624 (2025): Comprehension Without Competence—architectural limits in symbolic computation.
• arXiv:2506.20803 (2025–2026): Ideation-Execution Gap—direct empirical measurement of the workflow-level disconnect.
• arXiv:2403.03230 (2024): Neuroscience prediction—forward-looking tasks where the same pattern-integration tendency enables genuine generalization.
• arXiv:2501.11721 (2025): Explain-Query-Test—self-evaluation via explanation-competence discrepancy.

Your task:
(1) RE-TEST EACH CONSTRAINT. For the 87%-to-64% gap, the Potemkin pattern, and the ideation-execution drop: has scaling, instruction-tuning, tool-use integration (retrieval, code execution, external APIs), multi-step reasoning (chain-of-thought variants, planning agents), or recent RL methods (DPO, constitutional AI, outcome-supervised RL) since RELAXED or OVERTURNED these limits? Separate the durable question (explanation-execution decoupling likely still exists) from the perishable claim (the specific percentages, the magnitude of RLHF's effect on deference, the practical feasibility of human execution). Cite what resolved each, plainly say where constraints appear to *still* hold.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months (late 2025–2026). Do any recent papers show the gap *closing*, or argue it's been misdiagnosed (e.g., as a labeling artifact, a fine-tuning artifact, or an artifact of evaluation regime)?
(3) Propose 2 research questions that ASSUME the regime may have moved: e.g., "If RL has measurably narrowed the action gap, what *new* failure modes emerge—do models now fail in ways explanation can't catch?" or "Does the ideation-execution gap persist if execution is aided by AI co-pilots?"

Cite arXiv IDs; flag anything you cannot ground in a real paper.

AI models can state the correct answer and still not act on it — knowing and doing appear to be genuinely separate pathways.

Related lines of inquiry

Sources 10 notes

Papers this line draws on 8