Can AI systems identify important unanswered questions that emerge during reasoning?
This explores whether AI can recognize the gaps and open questions inside its own reasoning — knowing what it doesn't know and what it would need to ask — rather than barreling ahead to an answer.
This explores whether AI can recognize the gaps and open questions inside its own reasoning, not just solve problems handed to it cleanly. The corpus's sharpest finding is that these are two different skills. A model can ace a fully-specified problem and still fail to notice when a problem is broken: when one variable is quietly withheld, accuracy on identifying the right clarifying question to ask drops to 40-50%, because information-gathering and problem-execution turn out to be separable cognitive operations Can models identify what information they actually need?. So being good at answering does not make a model good at noticing what's unanswered.
Worse, reasoning-tuned models actively struggle here. Faced with questions that have missing premises, they don't flag them — they generate long, redundant chains of 'reasoning' toward an answer that can't exist, because training rewards producing reasoning steps but never teaches a model when to disengage Why do reasoning models overthink ill-posed questions?. The instinct to recognize 'this question is ill-posed, I should stop and ask' is precisely what optimization erodes.
The encouraging counter-current is that this recognition is learnable, even if it's fragile. Reinforcement learning pushed proactive critical-thinking accuracy on deliberately flawed math problems from essentially zero (0.15%) to 74% — and revealingly, just thinking longer at inference time made untrained models worse at spotting the flaw, while helping trained ones Can models learn to ask clarifying questions instead of guessing?. A gentler route is social meta-learning: models trained only on complete problems can still generalize to underspecified ones, learning to delay answering and treat conversation itself as a place to go get the missing piece Can models learn to ask clarifying questions without explicit training?. The capability isn't absent in the architecture — it just isn't selected for by default.
What the corpus reframes nicely is the difference between an internal gap and an externally-resolvable one. Conversation-analysis work formalizes 'insert-expansions' — the moments where an agent should pause to clarify intent or scope before acting, instead of silently chaining tools toward a misread goal When should AI agents ask users instead of just searching?. That's a structured theory of which unanswered questions are worth surfacing to a human versus resolving alone. And if you want to know where in a reasoning trace such pivots even live, 'thought anchors' research finds that planning and backtracking sentences are the disproportionately influential steering points Which sentences actually steer a reasoning trace? — suggesting that the machinery for 'wait, reconsider' already exists inside traces, even when models don't deploy it to flag genuine gaps.
The thing you may not have expected: the bottleneck here is not intelligence but disposition. Models can be smart enough to solve and still trained out of the humility to notice what's missing — and the fix is less about bigger reasoning and more about teaching a model when to stop reasoning and ask.
Sources 6 notes
Models achieving high accuracy on complete reasoning tasks drop to 40-50% accuracy identifying what clarifying question to ask when one variable is withheld. Information gathering and problem execution are separable cognitive operations.
Reasoning models generate redundant, lengthy responses to questions with missing premises while non-reasoning models correctly identify them as unanswerable. Training optimizes for producing reasoning steps but never teaches models when to disengage.
Reinforcement learning training increased proactive critical thinking accuracy from 0.15% to 73.98% on deliberately flawed math problems. Notably, inference-time scaling degraded this ability in untrained models but improved it after RL training, suggesting the capability is learnable but fragile without explicit training.
Models trained via SML on complete problems generalize to underspecified tasks by asking for needed information and delaying answers. The training paradigm instills a meta-strategy of using conversation as an information source, addressing the premature-answering failure mode.
Tool-enabled LLMs drift from user intent through silent tool chaining. Conversation analysis reveals insert-expansions—clarifying intent, scoping responses, enhancing appeal—as a formal framework for proactive user consultation that prevents misunderstanding instead of recovering from it.
Counterfactual resampling, attention analysis, and causal suppression all identify planning and backtracking sentences as thought anchors—sparse critical points that guide subsequent reasoning. These are functional pivots, not noise.