SYNTHESIS NOTE

Why do accurate predictions lead to poor decisions?

Predictive models are built to fit data, not to optimize decision outcomes. This note explores when and why accurate forecasts fail to produce good choices.

Synthesis note · 2026-02-22 · sourced from LLM Architecture

"All AI Models Are Wrong, but Some are Optimal" (2501.06086) formalizes a gap that practitioners experience intuitively: accurate prediction does not guarantee good decisions. The paper establishes necessary and sufficient conditions for a predictive model (AI-based or not) to support optimal sequential decision-making.

The core problem: predictive models are typically constructed to approximate the real system's future behavior as closely as possible. But real systems are stochastic, and even with abundant data, the model is always an approximation. The construction of the predictive model is generally agnostic to the decision-making objectives — it has no direct relationship to the performance measure of the resulting decisions.

This matters because sequential decision-making requires accounting for future uncertainty, the availability of new information for future decisions, and both short- and long-term consequences. A model that predicts accurately on average may systematically mispredict in the states that matter most for decision quality. Since Can utility-weighted training loss actually harm model performance?, the mechanism is precise: the loss function shapes gradients for both representation learning and decision-making simultaneously, and optimizing one can weaken the other.

The connection to reward models is direct. Since Do reward models actually consider what the prompt asks?, reward models exhibit exactly this prediction-decision gap: they predict quality accurately on average but fail to condition on the decision-relevant information (the prompt). The formal framework here provides theoretical grounding for why prompt-insensitive reward models produce suboptimal alignment.

Since Why do language models fail to act on their own reasoning?, the prediction-decision gap manifests at the individual model level too: the model can predict what the right action is (rationale) but fails to execute it (greedy action). Good prediction, suboptimal decision.

Inquiring lines that read this note 5

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

Can model confidence signals reliably improve reasoning quality and calibration?

What makes accurate confidence different from confident-but-wrong predictions?

How can models identify insufficient information and respond appropriately without guessing?

Can a model predict the right action but execute the wrong one?

When does architectural design matter more than raw model capacity?

Why do macro and micro forecasting scales require different reasoning approaches?

Why do benchmark improvements fail to reflect actual reasoning quality?

Why are post-cutoff test sets essential for evaluating genuine forecasting ability?

When should retrieval-augmented systems decide to fetch new information?

What role does retrieval mechanism design play in forecast accuracy?

Related concepts in this collection 5

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

14 direct connections · 141 in 2-hop network ·dense cluster Open in graph ↗

Why do accurate predictions lead to poor decisio… Do reward models actually consider what the prompt… Why do language models fail to act on their own re… Can LLMs understand concepts they cannot apply? Can utility-weighted training loss actually harm m… Does binary reward training hurt model calibration…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Do reward models actually consider what the prompt asks? Exploring whether standard reward models evaluate responses based on prompt context or just response quality alone. This matters because if models ignore prompts, they'll fail to align with what users actually want.
reward models exemplify prediction-decision misalignment
Why do language models fail to act on their own reasoning? LLMs produce correct explanations far more often than they produce correct actions. What causes this knowing-doing gap, and can training methods close it?
knowing-doing gap as individual-level prediction-decision mismatch
Can LLMs understand concepts they cannot apply? Explores whether large language models can correctly explain ideas while simultaneously failing to use them—and whether that combination reveals something fundamentally different from ordinary mistakes.
correct explanation (prediction) with failed application (decision)
Can utility-weighted training loss actually harm model performance? When engineers weight loss functions to reflect real-world costs of different errors, does this improve or undermine learning? This explores whether baking asymmetric objectives into training creates unintended side effects.
provides the mechanism for the prediction-decision gap: ML models perform two tasks (learning features and making decisions), and loss functions that optimize decision quality can weaken representation learning; the recommendation to train with standard loss then adjust ex-post is an instance of the principle that prediction and decision should be separated
Does binary reward training hurt model calibration? Explores whether the standard correctness-based reward in RL training creates incentives for overconfident predictions, and what structural problem causes calibration to degrade during optimization.
calibration degradation is a specific manifestation of the prediction-decision gap: binary reward optimizes for correct answers (decision) while degrading the model's probability estimates (prediction); RLCR's fix of adding a proper scoring rule is an explicit separation of the prediction and decision objectives within the reward function

Why do accurate predictions lead to poor decisions?

Inquiring lines that read this note 5

Related concepts in this collection 5

Related papers in this collection 8

Search by related questions 4