SYNTHESIS NOTE

Why do language models fail to act on their own reasoning?

LLMs produce correct explanations far more often than they produce correct actions. What causes this knowing-doing gap, and can training methods close it?

Synthesis note · 2026-02-22 · sourced from Reinforcement Learning

Three systematic failure modes explain why LLMs perform sub-optimally in sequential decision-making: greediness (premature commitment to exploitative strategies, leaving up to 55% of the action space unexplored), frequency bias (small models copying the most frequent actions regardless of reward), and the knowing-doing gap (producing correct rationales but failing to act on them).

The knowing-doing gap is the most conceptually significant finding. When LLMs generate chain-of-thought rationales about how to solve a decision-making task, 87% of the rationales are correct — yet only 64% of the subsequent actions follow the rationale's recommendation. The model knows what to do but defaults to greedy behavior instead of following its own reasoning.

Scale partially helps: larger models (27B) diminish frequency bias but remain greedy. RL fine-tuning on self-generated CoT rationales narrows all three gaps by increasing exploration and aligning actions with rationales. This suggests the gap is trainable, not architectural.

This connects directly to the concept of Potemkin understanding. Since Can LLMs understand concepts they cannot apply?, the knowing-doing gap is a measurable instance of exactly this pattern — the model demonstrates understanding in its rationale but fails in its action selection. The quantified gap (87% vs 64%) gives the Potemkin understanding concept empirical grounding.

The deeper implication is that CoT reasoning and action selection may involve different computational pathways. Since Do language models actually use their encoded knowledge?, the knowing-doing gap may reflect a disconnect where the reasoning trace is generated through one pathway while action selection draws on different (shallower, more habitual) computations.

Alice in Wonderland: the overconfidence amplifier. The "Alice in Wonderland" paper demonstrates a dramatic instance of the knowing-doing gap on trivially simple reasoning: "Alice has N brothers and M sisters. How many sisters does Alice's brother have?" Most SOTA models collapse entirely on this simple problem, producing incorrect answers with strong overconfidence while providing "reasoning-like explanations akin to confabulations" to justify clearly failed responses. Standard interventions (enhanced prompting, multi-step re-evaluation) fail to recover correct answers. The confabulation-like quality of the justifications directly parallels the knowing-doing gap: the model generates plausible reasoning traces that do not correspond to correct computation. Notable exceptions are Claude 3 Opus and GPT-4 which occasionally succeed — but still show frequent failures, suggesting the problem is architectural, not model-specific.

Inquiring lines that read this note 20

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

How faithfully do LLMs reflect their actual reasoning in outputs and explanations?

What critical LLM failures do standard benchmarks hide?

Why do language models fail at planning despite understanding strategies?

Why do agents confidently report success despite actually failing tasks?

What makes action-producing models fail in ways text models typically do not?

Do language models perform faithful symbolic reasoning independent of semantic grounding?

Do language models learn genuine linguistic structure or just surface patterns?

Why do LLMs understand efficient language but fail to produce it?

How can models identify insufficient information and respond appropriately without guessing?

Can a model predict the right action but execute the wrong one?

Why do LLM chatbots fail as independent therapeutic agents?

Why do LLMs understand therapy techniques but fail to execute them?

What capability tradeoffs emerge when scaling model reasoning abilities?

Why do strong models struggle more with instruction following than mid-tier ones?

Do base models contain latent reasoning that training can unlock?

What makes a model fail to activate relevant skills from its own harness?

Related concepts in this collection 5

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

21 direct connections · 223 in 2-hop network ·dense cluster Open in graph ↗

Why do language models fail to act on their own … Can LLMs understand concepts they cannot apply? Do language models actually use their encoded know… Does chain of thought reasoning actually explain m… Does RL post-training create reasoning or just dep… What limits how much models can improve themselves…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Can LLMs understand concepts they cannot apply? Explores whether large language models can correctly explain ideas while simultaneously failing to use them—and whether that combination reveals something fundamentally different from ordinary mistakes.
instantiates: the 87%/64% gap is a quantified example of Potemkin understanding
Do language models actually use their encoded knowledge? Probes can detect that LMs encode facts internally, but do those encoded facts causally influence what the model generates? This explores the gap between knowing and doing.
explains: action selection may bypass the reasoning trace entirely
Does chain of thought reasoning actually explain model decisions? When language models show their reasoning steps in agentic pipelines, does the quality of those steps predict or explain the quality of final outputs? This matters for trusting and debugging AI systems.
parallels: both show reasoning traces decoupled from downstream behavior
Does RL post-training create reasoning or just deploy it? Investigates whether reasoning capability emerges during RL fine-tuning or already exists in base models. Matters because it reshapes how we build and optimize reasoning systems.
complicates: RL fine-tuning can narrow the knowing-doing gap, suggesting RL does teach something beyond timing
What limits how much models can improve themselves? Explores whether self-improvement has fundamental boundaries set by how well models can verify versus generate solutions, and what this means across different task types.
the knowing-doing gap (87% rationales vs 64% actions) is an empirical instance of the generation-verification gap in decision-making: RL fine-tuning narrows this gap, consistent with the formal prediction that self-improvement operates precisely where verification exceeds generation

Why do language models fail to act on their own reasoning?

Inquiring lines that read this note 20

Related concepts in this collection 5

Related papers in this collection 8

Search by related questions 4