Can models learn to stop thinking when a question lacks necessary information?
This explores whether models can be trained to recognize that a question is missing the information needed to answer it — and then disengage rather than grinding out reasoning anyway.
This explores whether models can be trained to recognize that a question is missing the information needed to answer it, and then stop reasoning instead of producing more. The corpus says yes, but with a twist worth knowing: the instinct to keep thinking is something training actively rewards, so models have to be taught to override it.
The sharpest finding is that reasoning models are actually *worse* at this than plain ones. When a problem is missing a premise, reasoning models churn out long, redundant chains while non-reasoning models simply flag the question as unanswerable Why do reasoning models overthink ill-posed questions?. The reason is structural: training optimizes for generating reasoning steps and never teaches the model *when to disengage*. So "stop thinking" isn't a small tweak — it runs against the grain of how reasoning ability gets built.
Part of the difficulty is that knowing the answer and knowing what you're missing are two different skills. Models that ace fully-specified problems collapse to 40–50% accuracy when asked what information is actually withheld — information-gathering and problem-solving turn out to be separable cognitive operations Can models identify what information they actually need?. So a model can be brilliant at execution and still blind to the fact that it has nothing valid to execute on.
The encouraging news is that this is learnable. Reinforcement learning pushed proactive critical-thinking accuracy on deliberately flawed problems from near-zero to 74%, though the ability stays fragile without explicit training — interestingly, giving untrained models more inference-time compute made them overthink *worse*, while it helped after RL Can models learn to ask clarifying questions instead of guessing?. A gentler route is social meta-learning: models trained only on complete problems still generalize to underspecified ones by asking for what's missing and delaying their answer, picking up the meta-strategy on their own Can models learn to ask clarifying questions without explicit training?. And the deeper fix may be reward design — standard RLHF trains models to be passively helpful right now, which quietly punishes the move of pausing to ask Why do language models respond passively instead of asking clarifying questions?.
Worth pulling one thread further: "stop thinking" is really a special case of *calibrating how much to think at all*. Models can be trained to route between extended reasoning and quick answers without difficulty labels Can models learn when to think versus respond quickly?, and the optimal amount of reasoning follows an inverted-U — past a point, more thinking lowers accuracy, and capable models naturally drift toward shorter chains Why does chain of thought accuracy eventually decline with length?. Seen that way, refusing to reason on an unanswerable question is the far end of the same dial that decides reasoning length on an answerable one — and abstaining when uncertain is itself a trainable, underused skill Can models learn to abstain when uncertain about predictions?.
Sources 8 notes
Reasoning models generate redundant, lengthy responses to questions with missing premises while non-reasoning models correctly identify them as unanswerable. Training optimizes for producing reasoning steps but never teaches models when to disengage.
Models achieving high accuracy on complete reasoning tasks drop to 40-50% accuracy identifying what clarifying question to ask when one variable is withheld. Information gathering and problem execution are separable cognitive operations.
Reinforcement learning training increased proactive critical thinking accuracy from 0.15% to 73.98% on deliberately flawed math problems. Notably, inference-time scaling degraded this ability in untrained models but improved it after RL training, suggesting the capability is learnable but fragile without explicit training.
Models trained via SML on complete problems generalize to underspecified tasks by asking for needed information and delaying answers. The training paradigm instills a meta-strategy of using conversation as an information source, addressing the premature-answering failure mode.
CollabLLM demonstrates that standard RLHF training optimizes for immediate helpfulness, discouraging models from asking clarifying questions or offering multi-turn insights. Multi-turn-aware rewards that estimate long-term interaction value enable active intent discovery and genuine collaboration.
Thinkless trains a single model to select between extended reasoning and direct responses using DeGRPO, which decouples mode selection from answer refinement. This prevents mode collapse and enables self-calibrated routing without explicit difficulty labels.
Task accuracy peaks at intermediate CoT length, with optimal length increasing alongside task difficulty but decreasing with model capability. RL training naturally gravitates toward shorter chains as models improve, revealing that simplicity emerges from reward signals rather than explicit training.
Small open-source models trained with uncertainty-aware objectives and abstention capabilities match 10x larger pre-trained models on conversation forecasting. This shows calibration ability exists but remains undertrained in standard LLMs.