Can models learn to ask genuinely useful clarifying questions?
Explores whether question-asking quality is teachable through decomposing it into specific attributes like clarity and relevance, rather than treating it as a monolithic skill.
The ALFA (Aligning LLMs to Ask) framework addresses a specific capability gap: LLMs fail to ask effective questions under uncertainty, making them unreliable in domains where proactive information-gathering is essential for decision-making.
The framework has three components:
- Decompose — break down "good question" into theory-grounded attributes (e.g., clarity, relevance, specificity)
- Synthesize — controllably generate attribute-specific question variations (80K preference pairs)
- Align — preference-based optimization to learn asking better questions along fine-grained attributes
Applied to clinical reasoning using the MediQ-AskDocs dataset (17K real-world clinical interactions), ALFA demonstrates that question quality is not unitary — a question can be clear but irrelevant, or relevant but ambiguous. Decomposing quality into attributes and training against each one produces better overall question-asking than optimizing for a single "question quality" score.
The clinical domain makes the stakes concrete: a doctor who asks the wrong clarifying question may miss a critical symptom. Models that excel at static medical QA benchmarks still fail at the interactive task of gathering missing information through conversation. Since Can models learn to ask clarifying questions instead of guessing?, ALFA provides the methodology for making those clarifying questions actually good — not just present.
This connects to the broader clarification design finding. Since Which clarifying questions actually improve user satisfaction?, the attribute decomposition explains why: a question high on specificity and relevance but low on verbosity will outperform one that merely paraphrases the user's need. Attribute-specific training can target exactly the dimensions that matter.
PerQs provides practical validation of attribute-based question quality at scale. The Active Listening system populates prompt templates with 400+ real user interests (aggregated from ~39K anonymous user models) and generates personalized Q&A pairs (~19K total) via LLM. Deployed in Alexa Prize, personalized questions showed significant positive effects on perceived conversation quality. The interest-personalization dimension demonstrates that "good questions" are not just structurally well-formed (ALFA's clarity, relevance, specificity attributes) but also content-aligned with user interests — a dimension that attribute-specific training could incorporate as an additional quality axis.
Inquiring lines that use this note as a source 69
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- How do attribute-asking strategies depend on current confidence in candidate items?
- Why does item discrimination matter more than surface-level question plausibility?
- Could AI assessment quality differ across subjects or question formats?
- Why can't language models conduct genuine Socratic questioning in therapy sessions?
- Can proactive critical thinking alone enable models to request clarification effectively?
- Can language systems learn when to ask for clarification instead of choosing one reading?
- What measurement artifacts emerge when annotators interpret the same question differently?
- Can models identify information gaps without just guessing or refusing to answer?
- Can personalized questions improve conversation quality in open-domain chat?
- How does asymmetric information shape what to ask users first?
- Can models learn to select exemplars based on reasoning skills rather than complexity?
- Does training on critiques of noisy responses produce deeper understanding than imitating correct ones?
- What makes some clarifying questions more useful than others?
- Why might expressed satisfaction with explanations diverge from actual cognitive clarity?
- Can testing prior knowledge and checking understanding improve explanation outcomes?
- What interaction patterns preserve human learning when AI provides domain answers?
- Can language models implement therapeutic skills like Socratic questioning in real conversations?
- Can language models understand the implicit emotional intent behind questions?
- Does current empathetic AI misalign with how humans actually ask questions?
- How does conversational closure differ from genuine problem understanding?
- Can conversation analysis predict when agents should ask users for clarification?
- Can proactive critical thinking train models to request clarification actively?
- How do contrasting examples improve AI feedback quality over generic suggestions?
- How does random walk length control reasoning complexity in question generation?
- How does ambiguity detection connect to models' ability to ask clarifying questions?
- What structural changes enable agents to ask clarifying questions?
- Can static word-sharing create genuine communicative grounding between humans and models?
- Can LLMs learn to ask clarifying questions instead of guessing?
- Can critic model trios evaluate reasoning quality more reliably than outcome rewards alone?
- Can models learn to identify what information is missing from questions?
- Can reward models trained for engagement fix the informativeness problem?
- Can attribute-specific preference optimization improve question quality in information-seeking?
- Why do weaker language models fail at multi-turn strategic questioning?
- Can language models ask clarifying questions when sentences are ambiguous?
- What distinguishes contrasting aspects from related aspects in question structure?
- Can the eight-dimension rubric predict which question types need decomposition?
- Why do more detailed rating systems sometimes improve learning from reviews?
- Can question quality be trained separately from the decision to ask?
- What makes specific-facet questions outperform generic need-rephrasing requests?
- What distinguishes proactive information provision from proactive clarification seeking?
- How does the Question Under Discussion shape what content projects?
- Why do question types determine retrieval and decomposition strategy in QA?
- Can models learn both what and how to study through reinforcement learning?
- What training approach enables models to proactively request clarification?
- What filtering criteria best identify student-compatible refinements from teacher models?
- How does RLHF training push chatbots toward problem-solving over exploration?
- Can reinforcement learning teach AI when to ask clarifying questions?
- Can models be trained to explain instead of imitate answers?
- Can AI learn intrinsic motivation to assess its own relevance?
- How does RLHF training reward models for guessing over asking clarifying questions?
- Why do specific clarifying questions outperform rephrased versions of user needs?
- Can attribute decomposition improve other interactive reasoning tasks beyond clinical questioning?
- What makes a clarifying question aligned with user interests versus structurally sound?
- Can tree search improve question generation the way it improves reasoning?
- Why do specific clarifying questions outperform generic requests for clarity?
- Do scheme critical questions work better than direct scheme classification prompts?
- Why do models struggle with asking questions in multi-turn conversational reasoning tasks?
- Can Q-priming further strengthen clarifying question behavior beyond social meta-learning alone?
- Can evaluation trajectories and interaction histories replace single-answer scoring?
- Can models learn to ask clarifying questions instead of making assumptions?
- How much does forcing single-choice answers damage alignment with complex intent?
- Why do explicit quality criteria outperform learning quality from examples alone?
- Can thought quality alone be trusted to guide model training?
- How do students learn to extract corrective information from asymmetric dialogue?
- Can structured questioning prompts improve reasoning beyond standard conversational training?
- Can question-only features replace model uncertainty checks at scale?
- Do models naturally learn to ask clarifying questions without explicit supervision?
- Which types of clarifying questions actually help users versus wasting their time?
- How can models select the optimal question to ask given multiple uncertainties?
Related concepts in this collection 4
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Can models learn to ask clarifying questions instead of guessing?
Exploring whether large language models can be trained to detect incomplete queries and actively request missing information rather than hallucinating answers or refusing to respond. This matters because conversational agents today remain passive, responding only when prompted.
ALFA provides the quality methodology for the proactive questioning capability
-
Which clarifying questions actually improve user satisfaction?
Not all clarification helps equally. This explores whether asking users to rephrase their needs works as well as asking targeted questions about specific information gaps.
attribute decomposition explains why specific questions outperform rephrasing
-
Can models identify what information they actually need?
When a reasoning task is missing a key piece of information, can language models recognize what's absent and ask the right clarifying question? QuestBench tests this capability directly.
ALFA directly trains the missing-information identification + question-asking capability
-
What makes strategic question-asking succeed or fail?
Explores whether excellent performance at multi-turn questioning requires one dominant skill or the coordinated interaction of multiple distinct capabilities. Matters because many real-world tasks (diagnosis, troubleshooting, clarification) depend on this ability.
20Q reveals the three capabilities strategic questioning requires; ALFA's attribute-specific training directly shapes the planning component (question efficiency, specificity)
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning
- STaR-GATE: Teaching Language Models to Ask Clarifying Questions
- The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think
- QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks?
- Explain-Query-Test: Self-Evaluating LLMs Via Explanation and Comprehension Discrepancy
- Learning to Learn from Language Feedback with Social Meta-Learning
- Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
- Researchy Questions: A Dataset of Multi-Perspective, Decompositional Questions for LLM Web Agents
Original note title
training models to ask good questions requires decomposing quality into theory-grounded attributes and aligning via attribute-specific preference optimization