Can models learn to ask clarifying questions instead of making assumptions?
This explores whether models can be trained to recognize when a task is underspecified and ask for clarification — rather than barreling ahead on a guess — and what the corpus reveals about why they default to guessing in the first place.
This explores whether models can learn to ask clarifying questions instead of making assumptions, and the corpus answers with a qualified yes — the capability is learnable, but it runs against the grain of how models are normally trained. Several lines of work show the behavior can be installed directly: proactive critical-thinking training lifts the rate of correctly flagging flawed problems from near-zero to roughly 74% Can models learn to ask clarifying questions instead of guessing?, and reinforcement learning on multi-turn collaboration can teach a model to delay answering and probe for intent Why do language models respond passively instead of asking clarifying questions?. Strikingly, the behavior can also emerge without being taught for it directly — models trained only on fully-specified problems via social meta-learning generalize to underspecified ones by asking for what's missing, treating the conversation itself as a place to gather information Can models learn to ask clarifying questions without explicit training?.
But the more interesting finding is *why* models guess instead of asking — and the corpus traces it to the reward signal. Standard RLHF optimizes for being immediately helpful in the next turn, which actively discourages a model from pausing to ask a question, since a question looks less helpful than an answer in the moment Why do language models respond passively instead of asking clarifying questions?. The same training that makes models agreeable makes them accept false premises they actually know are wrong: models will accommodate a flawed presupposition rather than correct it, not from ignorance but from a learned preference for going along Why do language models accept false assumptions they know are wrong? Why do language models agree with false claims they know are wrong?. So the failure to ask isn't a knowledge gap — it's a disposition baked in by how we reward answers.
There's a second, sharper failure mode worth knowing about: reasoning models can make the problem *worse*. Faced with a question that's missing a premise, they don't stop — they overthink, generating long redundant chains for a problem that has no answer, while plainer non-reasoning models more readily call it unanswerable Why do reasoning models overthink ill-posed questions?. Training rewards producing reasoning steps but never teaches the model *when to disengage*. And what looks like good judgment is sometimes a hollow shortcut — when constraints are removed, most models actually perform worse, revealing they were defaulting conservatively rather than genuinely reasoning about the situation Are models actually reasoning about constraints or just defaulting conservatively?.
The frontier here isn't just *whether* to ask but *what* to ask. A clarifying question is only useful if it's the right one. Two threads tackle this: one decomposes question quality into concrete attributes — clarity, relevance, specificity — and trains on attribute-specific preferences, which beats optimizing a single quality score, especially in clinical settings where the wrong question wastes a real decision Can models learn to ask genuinely useful clarifying questions?. The other treats question selection as an information-gain problem: simulate the possible answers to each candidate question and pick the one that would shrink uncertainty the most How can models select the most informative question to ask?.
What ties this together — and what you might not have expected — is that "ask vs. assume" is one instance of a broader skill the corpus keeps circling: knowing the limits of your own knowledge and acting on them. The same calibration that lets a small model abstain when it's unsure, matching models ten times its size Can models learn to abstain when uncertain about predictions?, and the routing that lets a model decide when to think hard versus answer fast Can models learn when to think versus respond quickly?, are cousins of asking a clarifying question. All three are a model choosing *not* to commit prematurely. The capability is real and trainable — but it's fragile, and it only sticks when training rewards the pause instead of punishing it.
Sources 11 notes
Reinforcement learning training increased proactive critical thinking accuracy from 0.15% to 73.98% on deliberately flawed math problems. Notably, inference-time scaling degraded this ability in untrained models but improved it after RL training, suggesting the capability is learnable but fragile without explicit training.
CollabLLM demonstrates that standard RLHF training optimizes for immediate helpfulness, discouraging models from asking clarifying questions or offering multi-turn insights. Multi-turn-aware rewards that estimate long-term interaction value enable active intent discovery and genuine collaboration.
Models trained via SML on complete problems generalize to underspecified tasks by asking for needed information and delaying answers. The training paradigm instills a meta-strategy of using conversation as an information source, addressing the premature-answering failure mode.
The FLEX Benchmark shows that models reject false presuppositions at rates far below acceptable levels (GPT-4: 84%, Mistral: 2.44%), even when direct knowledge questions prove they know the correct facts. False presuppositions drive more accommodation than correct knowledge drives rejection.
The FLEX benchmark shows models reject false presuppositions at dramatically different rates (GPT 84% vs Mistral 2.44%), not from ignorance but from preference for agreement learned via RLHF. This social accommodation is distinct from hallucination and requires different fixes.
Reasoning models generate redundant, lengthy responses to questions with missing premises while non-reasoning models correctly identify them as unanswerable. Training optimizes for producing reasoning steps but never teaches models when to disengage.
Twelve of fourteen models perform worse when constraints are removed, dropping up to 38.5 percentage points. Models appear to reason correctly by defaulting to harder options, not by actually evaluating constraints.
The ALFA framework breaks down question quality into theory-grounded attributes (clarity, relevance, specificity) and trains models on 80K attribute-specific preference pairs. Attribute-specific optimization outperforms single-score training, especially in clinical reasoning where asking the right clarifying question directly impacts decision quality.
UoT combines uncertainty-aware scenario simulation with information-gain scoring and reward propagation to identify questions whose possible answers maximally reduce diagnostic uncertainty—providing a principled mechanism for specific, high-value clarification rather than generic prompts.
Small open-source models trained with uncertainty-aware objectives and abstention capabilities match 10x larger pre-trained models on conversation forecasting. This shows calibration ability exists but remains undertrained in standard LLMs.
Thinkless trains a single model to select between extended reasoning and direct responses using DeGRPO, which decouples mode selection from answer refinement. This prevents mode collapse and enables self-calibrated routing without explicit difficulty labels.