INQUIRING LINE

Can models identify what information they are missing in underspecified tasks?

This explores whether a model can recognize the gap in an underspecified task — knowing *what it doesn't yet know* — as opposed to solving a problem once all the pieces are handed to it.


This explores whether a model can recognize the gap in an underspecified task — knowing *what it doesn't yet know* — rather than just solving a fully-specified one. The corpus's sharpest finding is that these are two different skills entirely. A model that aces complete reasoning problems drops to 40-50% accuracy the moment you withhold a single variable and ask it what to clarify Can models identify what information they actually need?. Being good at the work doesn't make a model good at noticing the work can't be done yet. Information-gathering and problem-execution are separable operations, and standard training optimizes the second while leaving the first nearly untouched.

The default failure mode is *premature answering* — models barrel ahead and guess rather than pause and ask. The encouraging news is that this gap appears learnable. Reinforcement learning aimed at "proactive critical thinking" took accuracy on deliberately flawed math problems from a near-zero 0.15% up to 74% Can models learn to ask clarifying questions instead of guessing?. And it doesn't always require explicit training on underspecified cases: models trained via social meta-learning on *complete* problems generalized to incomplete ones, learning to treat conversation itself as a source of missing information and to delay answering until they had it Can models learn to ask clarifying questions without explicit training?.

But there's a catch worth sitting with: the capability is real yet fragile. The same proactive-thinking work found that giving an *untrained* model more inference-time compute actually made it worse at spotting missing information — more thinking led to more confident guessing — while the same scaling helped only *after* the RL training was in place Can models learn to ask clarifying questions instead of guessing?. So 'think longer' is not a free fix; it can amplify the wrong instinct.

There's a useful distinction lurking here between two kinds of 'missing.' One is missing *task information* the user could simply provide — the right question fixes it. The other is missing *knowledge that was never in training*, and no amount of clarifying or clever prompting can supply it; prompt optimization only reorganizes what the model already has and hits a hard ceiling on foundational gaps Can prompt optimization teach models knowledge they lack?. A model that can't tell these apart will keep asking questions when it should admit it doesn't know — or worse, confidently fill the gap from strong training priors that override what's actually in front of it Why do language models ignore information in their context?.

Which points at what 'identifying missing information' really demands: calibrated uncertainty and a willingness to abstain. Small models trained with uncertainty-aware objectives and an explicit abstention option matched models ten times larger on forecasting tasks, precisely because they knew when to hold back Can models learn to abstain when uncertain about predictions?. The thread across all of this: knowing what you're missing is less a reasoning skill than a *metacognitive* one — and it's the part of the stack standard training leaves most underdeveloped.


Sources 6 notes

Can models identify what information they actually need?

Models achieving high accuracy on complete reasoning tasks drop to 40-50% accuracy identifying what clarifying question to ask when one variable is withheld. Information gathering and problem execution are separable cognitive operations.

Can models learn to ask clarifying questions instead of guessing?

Reinforcement learning training increased proactive critical thinking accuracy from 0.15% to 73.98% on deliberately flawed math problems. Notably, inference-time scaling degraded this ability in untrained models but improved it after RL training, suggesting the capability is learnable but fragile without explicit training.

Can models learn to ask clarifying questions without explicit training?

Models trained via SML on complete problems generalize to underspecified tasks by asking for needed information and delaying answers. The training paradigm instills a meta-strategy of using conversation as an information source, addressing the premature-answering failure mode.

Can prompt optimization teach models knowledge they lack?

Prompting works entirely within a model's pre-existing training distribution and cannot supply domain knowledge absent from training data. This creates a hard ceiling: no prompt strategy can compensate for missing foundational knowledge, only reorganize what already exists.

Why do language models ignore information in their context?

Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.

Can models learn to abstain when uncertain about predictions?

Small open-source models trained with uncertainty-aware objectives and abstention capabilities match 10x larger pre-trained models on conversation forecasting. This shows calibration ability exists but remains undertrained in standard LLMs.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst tracking LLM metacognition. The question: Can models reliably identify what information they are missing in underspecified tasks — and do so *without* being explicitly trained on missing-information scenarios?

What a curated library found — and when (dated claims, not current truth):
Findings span 2024–2026; treat as perishable.
• Problem-solving skill and gap-identification are separable: models scoring 90%+ on complete reasoning tasks drop to 40–50% accuracy when asked to spot and clarify missing variables (2025-03, QuestBench).
• Premature answering is the default failure mode; RL on "proactive critical thinking" lifted gap-spotting on flawed problems from ~0% to 74% (2025-07).
• Untrained models given *more* inference-time compute performed *worse* at spotting missing info — scaling amplified confident guessing — but RL-trained models improved with longer thinking (2025-07).
• Social meta-learning on fully-specified problems generalized to incomplete ones; models learned to treat conversation as a missing-information source without explicit training (2026-02).
• Models confuse two gaps: missing task context (user can provide) vs. missing training knowledge (unfixable by clarification); prompt optimization cannot inject new knowledge (2025-02).
• Calibrated uncertainty + abstention option enabled small models to match 10× larger peers on forecasting under ambiguity (2024-02).

Anchor papers (verify; mind their dates):
• arXiv:2503.22674 (2025-03, QuestBench — the core benchmark for this question)
• arXiv:2507.23407 (2025-07, Proactive Questioning — RL results)
• arXiv:2602.16488 (2026-02, Social Meta-Learning generalization)
• arXiv:2402.03284 (2024-02, Calibrated Forecasting under Uncertainty)

Your task:
(1) RE-TEST EACH CONSTRAINT. For each finding, assess whether newer training regimes (constitutional AI, RLHF variants, chain-of-thought tuning), inference harnesses (uncertainty quantification modules, multi-turn dialogue scaffolds), or evals have since relaxed it. Separate the durable question (what *should* the model ask?) from the perishable limitation (can current methods elicit it?). Flag which constraints still hold and which have eroded.
(2) Surface the strongest work from the last ~6 months that *contradicts* or *supersedes* the RL/social-learning path — especially any showing that inference-time scaling or simple prompting achieves proactive questioning without training.
(3) Propose two research questions that assume the regime may have shifted: (a) whether uncertainty quantification at training time (not just RL) makes gap-spotting emerge without explicit supervision, and (b) whether multi-modal or reasoning-token models (e.g., o1-style) exhibit proactive questioning as an emergent property rather than a learned skill.

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines