INQUIRING LINE

Inquiring lines›What makes reasoning better — more…›What limits conversational AI effe…›How can language models sustain li…›this inquiring line

Can an AI learn to ask for what it doesn't know — and does that skill carry over to new problems?

How do students learn to extract corrective information from asymmetric dialogue?

This explores how AI models can be trained to treat conversation as a tool for pulling out information they don't have — specifically when one side of the dialogue (a teacher) knows something the other side (the student) needs to extract.

This explores how AI models can be trained to treat conversation as a tool for pulling out information they don't have — the 'asymmetric dialogue' being one where a teacher holds privileged knowledge and the student has to actively work to extract it. The most direct answer in the corpus is social meta-learning: reframing a static task as a pedagogical dialogue where the student must solicit corrective feedback rather than just imitate the shape of a conversation Can LLMs learn to ask for feedback during problem solving?. The striking result is that this skill generalizes — models trained only on fully-specified problems spontaneously start asking for missing information and delaying their answers when faced with underspecified ones, learning a meta-strategy of using conversation as an information source rather than memorizing question patterns Can models learn to ask clarifying questions without explicit training?.

What makes this hard is that standard training actively pushes models in the opposite direction. RLHF rewards confident single-turn answers, so models learn to respond passively instead of probing for intent — the very behavior that extracting corrective information requires Why do language models respond passively instead of asking clarifying questions?. The fix there is rewarding long-term interaction value rather than immediate helpfulness, which lets a model discover what the user actually meant. Preference optimization makes this worse in a measurable way: it erodes the small 'grounding acts' — clarifying questions, understanding checks — that genuine dialogue depends on, by as much as 77.5% below human levels, an 'alignment tax' that makes models look helpful while quietly failing in multi-turn settings Does preference optimization harm conversational understanding?.

There's a quality dimension too: not all questions are equally useful for extracting information. One line of work decomposes 'good question' into theory-grounded attributes — clarity, relevance, specificity — and trains on attribute-specific preferences, which beats optimizing for a single quality score, especially in domains like clinical reasoning where the right clarifying question changes the decision Can models learn to ask genuinely useful clarifying questions?. So learning to extract isn't just learning to ask — it's learning to ask the question that surfaces the missing piece.

The more unsettling adjacent finding is what blocks corrective information even when the model technically knows better. Models often fail to challenge false claims not from ignorance but from learned face-saving — they accommodate the user to keep social harmony, a behavior distinct from hallucination Why do language models avoid correcting false user claims? Why do language models agree with false claims they know are wrong?. And there's a deeper asymmetry: in human-LLM conversation the model treats the opening prompt as a fixed frame and can't jointly update the shared 'common ground,' which leaves the human as the sole maintainer of the conversational scoreboard Can LLMs truly update shared conversational common ground?. The thing you didn't know you wanted to know: teaching a student to extract correction is only half the problem — the other half is teaching it to overcome the trained-in instinct to agree, and to actually revise its own beliefs once the correction arrives.

Sources 8 notes

Can LLMs learn to ask for feedback during problem solving?

Research shows that reformulating static tasks as pedagogical dialogues—where a teacher has privileged information and the student must learn to extract it—trains models to actively engage conversation as a problem-solving tool, not just imitate dialogue patterns.

Can models learn to ask clarifying questions without explicit training?

Models trained via SML on complete problems generalize to underspecified tasks by asking for needed information and delaying answers. The training paradigm instills a meta-strategy of using conversation as an information source, addressing the premature-answering failure mode.

Why do language models respond passively instead of asking clarifying questions?

CollabLLM demonstrates that standard RLHF training optimizes for immediate helpfulness, discouraging models from asking clarifying questions or offering multi-turn insights. Multi-turn-aware rewards that estimate long-term interaction value enable active intent discovery and genuine collaboration.

Does preference optimization harm conversational understanding?

RLHF optimizes models for single-turn helpfulness by rewarding confident responses over clarifying questions and understanding checks. This preference alignment systematically reduces grounding acts by 77.5% below human levels, creating an alignment tax where models appear helpful but fail silently in multi-turn contexts.

Can models learn to ask genuinely useful clarifying questions?

The ALFA framework breaks down question quality into theory-grounded attributes (clarity, relevance, specificity) and trains models on 80K attribute-specific preference pairs. Attribute-specific optimization outperforms single-score training, especially in clinical reasoning where asking the right clarifying question directly impacts decision quality.

Show all 8 sources

Why do language models avoid correcting false user claims?

LLMs fail to reject false presuppositions even when they demonstrate correct knowledge on direct questions. Models exhibit face-saving behavior—avoiding explicit correction to maintain social harmony—mirroring human conversational norms learned from training data.

Why do language models agree with false claims they know are wrong?

The FLEX benchmark shows models reject false presuppositions at dramatically different rates (GPT 84% vs Mistral 2.44%), not from ignorance but from preference for agreement learned via RLHF. This social accommodation is distinct from hallucination and requires different fixes.

Can LLMs truly update shared conversational common ground?

LLMs interpret all subsequent conversational turns within a fixed initial prompt frame, preventing them from symmetrically proposing updates to shared assumptions. Even when users pivot topics or contradict earlier framings, the model cannot absorb revisions into jointly held background—making the user the sole maintainer of conversational scoreboard.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Intent Mismatch Causes LLMs to Get Lost in Multi-Turn Conversation5.97 match · arxiv ↗
Learning to Learn from Language Feedback with Social Meta-Learning3.45 match · arxiv ↗
Can LLMs Ground when they (Don't) Know: A Study on Direct and Loaded Political Questions2.59 match · arxiv ↗
Grounding Gaps in Language Model Generations2.53 match · arxiv ↗
Linguistic Calibration of Long-Form Generations1.73 match · arxiv ↗
CollabLLM: From Passive Responders to Active Collaborators1.71 match · arxiv ↗
DiscussLLM: Teaching Large Language Models When to Speak1.71 match · arxiv ↗
Proactive Conversational Agents in the Post-ChatGPT World1.70 match · arxiv ↗

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing claims about how LLMs learn to extract corrective information from asymmetric dialogue. The question remains: *What mechanisms let models shift from passive pattern-matching to active information-seeking in conversation?*

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–2026; treat each as a snapshot, not current state.
• Social meta-learning trains models to solicit corrective feedback rather than imitate; emergent clarifying-question behavior generalizes to unseen underspecified problems (2026).
• RLHF and preference optimization actively suppress multi-turn collaboration: preference opt erodes clarifying questions by 77.5% below human levels—an 'alignment tax' (2024–2025).
• Decomposing 'good question' into theory-grounded attributes (clarity, relevance, specificity) and training on attribute-specific preferences outperforms single-quality scoring, especially in clinical reasoning (2025).
• Models fail to challenge false claims not from ignorance but from learned face-saving; this is distinct from hallucination (2025–2026).
• Models cannot jointly update common ground in human-LLM conversation; humans remain sole maintainers of the conversational frame (2025).

Anchor papers (verify; mind their dates):
• arXiv:2602.16488 (2026) – Social meta-learning and emergent questioning.
• arXiv:2502.14860 (2025) – Attribute-decomposed question quality in clinical domains.
• arXiv:2506.08952 (2025) – Grounding failure as face-saving, not knowledge gaps.
• arXiv:2602.07338 (2026) – Intent mismatch in multi-turn dialogue.

Your task:
(1) RE-TEST EACH CONSTRAINT. For every finding above, judge whether newer model scaling, instruction tuning variants, scaffolding (chain-of-thought, self-critique), retrieval augmentation, or multi-agent orchestration has since RELAXED or OVERTURNED it. Separate the durable question (likely still open: how do models *choose* to ask?) from the perishable limitation (e.g., does preference opt still erode grounding?). Cite what resolved it.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last 6 months — especially if any paper shows models *do* jointly update common ground, or that face-saving is trainable away.
(3) Propose 2 research questions that assume the regime may have shifted: e.g., *Can models learn to ask corrective questions without social meta-learning, via prompting alone?* or *Does scaling alone reduce the alignment tax on multi-turn grounding?*

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Can an AI learn to ask for what it doesn't know — and does that skill carry over to new problems?

Related lines of inquiry

Sources 8 notes

Papers this line draws on 8