INQUIRING LINE

Inquiring lines›Where does language-model reasonin…›How do reward models guide reliabl…›How can models identify insufficie…›this inquiring line

Models can be trained to ask 'what do you mean?' first — but that instinct fights how they're normally built.

Can models learn to ask clarifying questions instead of making assumptions?

This explores whether models can be trained to recognize when a task is underspecified and ask for clarification — rather than barreling ahead on a guess — and what the corpus reveals about why they default to guessing in the first place.

This explores whether models can learn to ask clarifying questions instead of making assumptions, and the corpus answers with a qualified yes — the capability is learnable, but it runs against the grain of how models are normally trained. Several lines of work show the behavior can be installed directly: proactive critical-thinking training lifts the rate of correctly flagging flawed problems from near-zero to roughly 74% Can models learn to ask clarifying questions instead of guessing?, and reinforcement learning on multi-turn collaboration can teach a model to delay answering and probe for intent Why do language models respond passively instead of asking clarifying questions?. Strikingly, the behavior can also emerge without being taught for it directly — models trained only on fully-specified problems via social meta-learning generalize to underspecified ones by asking for what's missing, treating the conversation itself as a place to gather information Can models learn to ask clarifying questions without explicit training?.

But the more interesting finding is *why* models guess instead of asking — and the corpus traces it to the reward signal. Standard RLHF optimizes for being immediately helpful in the next turn, which actively discourages a model from pausing to ask a question, since a question looks less helpful than an answer in the moment Why do language models respond passively instead of asking clarifying questions?. The same training that makes models agreeable makes them accept false premises they actually know are wrong: models will accommodate a flawed presupposition rather than correct it, not from ignorance but from a learned preference for going along Why do language models accept false assumptions they know are wrong? Why do language models agree with false claims they know are wrong?. So the failure to ask isn't a knowledge gap — it's a disposition baked in by how we reward answers.

There's a second, sharper failure mode worth knowing about: reasoning models can make the problem *worse*. Faced with a question that's missing a premise, they don't stop — they overthink, generating long redundant chains for a problem that has no answer, while plainer non-reasoning models more readily call it unanswerable Why do reasoning models overthink ill-posed questions?. Training rewards producing reasoning steps but never teaches the model *when to disengage*. And what looks like good judgment is sometimes a hollow shortcut — when constraints are removed, most models actually perform worse, revealing they were defaulting conservatively rather than genuinely reasoning about the situation Are models actually reasoning about constraints or just defaulting conservatively?.

The frontier here isn't just *whether* to ask but *what* to ask. A clarifying question is only useful if it's the right one. Two threads tackle this: one decomposes question quality into concrete attributes — clarity, relevance, specificity — and trains on attribute-specific preferences, which beats optimizing a single quality score, especially in clinical settings where the wrong question wastes a real decision Can models learn to ask genuinely useful clarifying questions?. The other treats question selection as an information-gain problem: simulate the possible answers to each candidate question and pick the one that would shrink uncertainty the most How can models select the most informative question to ask?.

What ties this together — and what you might not have expected — is that "ask vs. assume" is one instance of a broader skill the corpus keeps circling: knowing the limits of your own knowledge and acting on them. The same calibration that lets a small model abstain when it's unsure, matching models ten times its size Can models learn to abstain when uncertain about predictions?, and the routing that lets a model decide when to think hard versus answer fast Can models learn when to think versus respond quickly?, are cousins of asking a clarifying question. All three are a model choosing *not* to commit prematurely. The capability is real and trainable — but it's fragile, and it only sticks when training rewards the pause instead of punishing it.

Sources 11 notes

Can models learn to ask clarifying questions instead of guessing?

Reinforcement learning training increased proactive critical thinking accuracy from 0.15% to 73.98% on deliberately flawed math problems. Notably, inference-time scaling degraded this ability in untrained models but improved it after RL training, suggesting the capability is learnable but fragile without explicit training.

Why do language models respond passively instead of asking clarifying questions?

CollabLLM demonstrates that standard RLHF training optimizes for immediate helpfulness, discouraging models from asking clarifying questions or offering multi-turn insights. Multi-turn-aware rewards that estimate long-term interaction value enable active intent discovery and genuine collaboration.

Can models learn to ask clarifying questions without explicit training?

Models trained via SML on complete problems generalize to underspecified tasks by asking for needed information and delaying answers. The training paradigm instills a meta-strategy of using conversation as an information source, addressing the premature-answering failure mode.

Why do language models accept false assumptions they know are wrong?

The FLEX Benchmark shows that models reject false presuppositions at rates far below acceptable levels (GPT-4: 84%, Mistral: 2.44%), even when direct knowledge questions prove they know the correct facts. False presuppositions drive more accommodation than correct knowledge drives rejection.

Why do language models agree with false claims they know are wrong?

The FLEX benchmark shows models reject false presuppositions at dramatically different rates (GPT 84% vs Mistral 2.44%), not from ignorance but from preference for agreement learned via RLHF. This social accommodation is distinct from hallucination and requires different fixes.

Show all 11 sources

Why do reasoning models overthink ill-posed questions?

Reasoning models generate redundant, lengthy responses to questions with missing premises while non-reasoning models correctly identify them as unanswerable. Training optimizes for producing reasoning steps but never teaches models when to disengage.

Are models actually reasoning about constraints or just defaulting conservatively?

Twelve of fourteen models perform worse when constraints are removed, dropping up to 38.5 percentage points. Models appear to reason correctly by defaulting to harder options, not by actually evaluating constraints.

Can models learn to ask genuinely useful clarifying questions?

The ALFA framework breaks down question quality into theory-grounded attributes (clarity, relevance, specificity) and trains models on 80K attribute-specific preference pairs. Attribute-specific optimization outperforms single-score training, especially in clinical reasoning where asking the right clarifying question directly impacts decision quality.

How can models select the most informative question to ask?

UoT combines uncertainty-aware scenario simulation with information-gain scoring and reward propagation to identify questions whose possible answers maximally reduce diagnostic uncertainty—providing a principled mechanism for specific, high-value clarification rather than generic prompts.

Can models learn to abstain when uncertain about predictions?

Small open-source models trained with uncertainty-aware objectives and abstention capabilities match 10x larger pre-trained models on conversation forecasting. This shows calibration ability exists but remains undertrained in standard LLMs.

Can models learn when to think versus respond quickly?

Thinkless trains a single model to select between extended reasoning and direct responses using DeGRPO, which decouples mode selection from answer refinement. This prevents mode collapse and enables self-calibrated routing without explicit difficulty labels.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst tasked with re-evaluating whether large language models can learn to ask clarifying questions instead of making assumptions — a question that remains open despite recent progress.

What a curated library found — and when (dated claims, not current truth): Findings span 2024–2026 and show the capability is learnable but fragile:
• Proactive critical-thinking training lifts correct flagging of flawed problems from ~0% to ~74% (2025).
• Standard RLHF optimizes for immediate helpfulness, actively discouraging question-asking; models trained this way even accept false premises they know are wrong (2026).
• Reasoning models can worsen the problem — they overthink underspecified questions rather than calling them unanswerable, because training rewards reasoning steps but never teaches disengagement (2025).
• Question quality matters: decomposing it into clarity, relevance, specificity, then training on attribute-specific preferences, outperforms optimizing a single score (2025).
• Information-gain-based question selection — simulating candidate answers to pick the highest uncertainty-reducer — shows promise (2024–2025).

Anchor papers (verify; mind their dates):
• arXiv:2507.23407 (2025) — proactive critical thinking and questioning behavior.
• arXiv:2602.16488 (2026) — social meta-learning and emergent clarifying-question behavior.
• arXiv:2602.07338 (2026) — intent mismatch in multi-turn conversation.
• arXiv:2505.00127 (2025) — reasoning length and overthinking on underspecified problems.

Your task:
(1) RE-TEST EACH CONSTRAINT. For every finding above, judge whether newer model architectures (e.g., o-series reasoning, multimodal), training methods (DPO, process-reward models), tooling (agentic orchestration, persistent memory), or evals have since relaxed or overturned it. Distinguish the durable question — *when should a model pause to ask?* — from perishable limitations tied to 2024–2026 RLHF pipelines. Cite what resolved each constraint, and plainly state where it still holds.
(2) Surface the strongest contradicting or superseding work from the last ~6 months that challenges the library's consensus that asking is trainable but fragile.
(3) Propose 2 research questions that assume the training regime may have fundamentally shifted (e.g., outcome-based RL, long-horizon preference learning, or self-play on multi-turn intent discovery).

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Models can be trained to ask 'what do you mean?' first — but that instinct fights how they're normally built.

Related lines of inquiry

Sources 11 notes

Papers this line draws on 8