Why do accurate predictions lead to poor decisions?
Predictive models are built to fit data, not to optimize decision outcomes. This note explores when and why accurate forecasts fail to produce good choices.
"All AI Models Are Wrong, but Some are Optimal" (2501.06086) formalizes a gap that practitioners experience intuitively: accurate prediction does not guarantee good decisions. The paper establishes necessary and sufficient conditions for a predictive model (AI-based or not) to support optimal sequential decision-making.
The core problem: predictive models are typically constructed to approximate the real system's future behavior as closely as possible. But real systems are stochastic, and even with abundant data, the model is always an approximation. The construction of the predictive model is generally agnostic to the decision-making objectives — it has no direct relationship to the performance measure of the resulting decisions.
This matters because sequential decision-making requires accounting for future uncertainty, the availability of new information for future decisions, and both short- and long-term consequences. A model that predicts accurately on average may systematically mispredict in the states that matter most for decision quality. Since Can utility-weighted training loss actually harm model performance?, the mechanism is precise: the loss function shapes gradients for both representation learning and decision-making simultaneously, and optimizing one can weaken the other.
The connection to reward models is direct. Since Do reward models actually consider what the prompt asks?, reward models exhibit exactly this prediction-decision gap: they predict quality accurately on average but fail to condition on the decision-relevant information (the prompt). The formal framework here provides theoretical grounding for why prompt-insensitive reward models produce suboptimal alignment.
Since Why do language models fail to act on their own reasoning?, the prediction-decision gap manifests at the individual model level too: the model can predict what the right action is (rationale) but fails to execute it (greedy action). Good prediction, suboptimal decision.
Inquiring lines that use this note as a source 5
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- What makes accurate confidence different from confident-but-wrong predictions?
- Can a model predict the right action but execute the wrong one?
- Why do macro and micro forecasting scales require different reasoning approaches?
- Why are post-cutoff test sets essential for evaluating genuine forecasting ability?
- What role does retrieval mechanism design play in forecast accuracy?
Related concepts in this collection 5
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Do reward models actually consider what the prompt asks?
Exploring whether standard reward models evaluate responses based on prompt context or just response quality alone. This matters because if models ignore prompts, they'll fail to align with what users actually want.
reward models exemplify prediction-decision misalignment
-
Why do language models fail to act on their own reasoning?
LLMs produce correct explanations far more often than they produce correct actions. What causes this knowing-doing gap, and can training methods close it?
knowing-doing gap as individual-level prediction-decision mismatch
-
Can LLMs understand concepts they cannot apply?
Explores whether large language models can correctly explain ideas while simultaneously failing to use them—and whether that combination reveals something fundamentally different from ordinary mistakes.
correct explanation (prediction) with failed application (decision)
-
Can utility-weighted training loss actually harm model performance?
When engineers weight loss functions to reflect real-world costs of different errors, does this improve or undermine learning? This explores whether baking asymmetric objectives into training creates unintended side effects.
provides the mechanism for the prediction-decision gap: ML models perform two tasks (learning features and making decisions), and loss functions that optimize decision quality can weaken representation learning; the recommendation to train with standard loss then adjust ex-post is an instance of the principle that prediction and decision should be separated
-
Does binary reward training hurt model calibration?
Explores whether the standard correctness-based reward in RL training creates incentives for overconfident predictions, and what structural problem causes calibration to degrade during optimization.
calibration degradation is a specific manifestation of the prediction-decision gap: binary reward optimizes for correct answers (decision) while degrading the model's probability estimates (prediction); RLCR's fix of adding a proper scoring rule is an explicit separation of the prediction and decision objectives within the reward function
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- All AI Models are Wrong, but Some are Optimal
- Open Problems in Mechanistic Interpretability
- Chain-of-Thought Is Not Explainability
- Should Humans Lie to Machines? The Incentive Compatibility of Lasso and General Weighted Lasso
- AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions
- Towards Faithfully Interpretable NLP Systems: How should we define and evaluate faithfulness?
- Learning To Guide Human Experts Via Personalized Large Language Models
- KTO: Model Alignment as Prospect Theoretic Optimization
Original note title
predictive AI models optimized for data fit produce suboptimal decisions — formal conditions define when prediction enables optimal policy