SYNTHESIS NOTE
Reasoning, Retrieval, and Evaluation Training, RL, and Test-Time Scaling Model Architecture and Internals

Why do language models fail to act on their own reasoning?

LLMs produce correct explanations far more often than they produce correct actions. What causes this knowing-doing gap, and can training methods close it?

Synthesis note · 2026-02-22 · sourced from Reinforcement Learning
How should we allocate compute budget at inference time? What kind of thing is an LLM really?

Three systematic failure modes explain why LLMs perform sub-optimally in sequential decision-making: greediness (premature commitment to exploitative strategies, leaving up to 55% of the action space unexplored), frequency bias (small models copying the most frequent actions regardless of reward), and the knowing-doing gap (producing correct rationales but failing to act on them).

The knowing-doing gap is the most conceptually significant finding. When LLMs generate chain-of-thought rationales about how to solve a decision-making task, 87% of the rationales are correct — yet only 64% of the subsequent actions follow the rationale's recommendation. The model knows what to do but defaults to greedy behavior instead of following its own reasoning.

Scale partially helps: larger models (27B) diminish frequency bias but remain greedy. RL fine-tuning on self-generated CoT rationales narrows all three gaps by increasing exploration and aligning actions with rationales. This suggests the gap is trainable, not architectural.

This connects directly to the concept of Potemkin understanding. Since Can LLMs understand concepts they cannot apply?, the knowing-doing gap is a measurable instance of exactly this pattern — the model demonstrates understanding in its rationale but fails in its action selection. The quantified gap (87% vs 64%) gives the Potemkin understanding concept empirical grounding.

The deeper implication is that CoT reasoning and action selection may involve different computational pathways. Since Do language models actually use their encoded knowledge?, the knowing-doing gap may reflect a disconnect where the reasoning trace is generated through one pathway while action selection draws on different (shallower, more habitual) computations.

Alice in Wonderland: the overconfidence amplifier. The "Alice in Wonderland" paper demonstrates a dramatic instance of the knowing-doing gap on trivially simple reasoning: "Alice has N brothers and M sisters. How many sisters does Alice's brother have?" Most SOTA models collapse entirely on this simple problem, producing incorrect answers with strong overconfidence while providing "reasoning-like explanations akin to confabulations" to justify clearly failed responses. Standard interventions (enhanced prompting, multi-step re-evaluation) fail to recover correct answers. The confabulation-like quality of the justifications directly parallels the knowing-doing gap: the model generates plausible reasoning traces that do not correspond to correct computation. Notable exceptions are Claude 3 Opus and GPT-4 which occasionally succeed — but still show frequent failures, suggesting the problem is architectural, not model-specific.

Inquiring lines that use this note as a source 19

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 5

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
21 direct connections · 222 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

llms are greedy agents with a knowing-doing gap — correct rationales 87 percent but greedy actions 64 percent