How can models select the most informative question to ask?
Explores whether simulating possible futures and scoring questions by information gain can identify which clarifying question would best reduce uncertainty—moving beyond just deciding whether to ask toward deciding what to ask.
Most work on clarifying questions addresses WHETHER to ask. Uncertainty of Thoughts (UoT) addresses WHAT to ask — and provides a principled, information-theoretic mechanism for selecting the optimal question.
The algorithm has three components working together:
Uncertainty-aware simulation: the model generates multiple candidate questions, then simulates possible future scenarios for each — what might the user answer, and what would each answer imply? These simulations form a tree structure of possible futures.
Information-gain rewards: each simulated path is scored by how much it reduces the model's uncertainty about the true answer. Questions whose possible answers would maximally distinguish between remaining possibilities score highest.
Reward propagation: expected rewards are computed across all simulated futures, allowing selection of the question with highest expected information gain — the one that, on average across possible answers, most reduces uncertainty.
The medical diagnosis framing makes the mechanism concrete: a patient doesn't report full symptoms. The doctor must decide which question to ask next. A question like "Do you have a fever?" partitions the diagnostic space differently than "Have you traveled recently?" UoT formalizes this: given the current possibility set (diseases consistent with reported symptoms so far), which question's possible answers would most effectively narrow that set?
This connects directly to proactive critical thinking. Since Can models learn to ask clarifying questions instead of guessing?, the gap that proactive critical thinking fills is DETECTING incompleteness. UoT fills the complementary gap: once incompleteness is detected, SELECTING the most informative question to ask. And since Which clarifying questions actually improve user satisfaction?, UoT provides the mechanism for generating specific-facet questions rather than generic "can you be more specific?" prompts — the information-gain criterion naturally selects for questions that target the highest-value information asymmetry.
The connection to test-time scaling is architectural: UoT is essentially test-time compute applied to question generation. The simulation-propagation loop trades inference-time computation for better question selection, analogous to how reasoning models trade computation for better answers. Since Can dialogue planning balance fast responses with strategic depth?, UoT's simulation-propagation loop could serve as the System 2 question-selection mechanism within dual-process dialogue planning -- when uncertainty triggers the MCTS planner, information-gain scoring provides a principled criterion for which clarifying question to generate next. And since Can tree search replace human feedback in LLM training?, UoT's reward propagation across simulated futures is structurally analogous to MCTS backpropagation -- both use tree search to extract quality signals from exploration of future states.
Inquiring lines that use this note as a source 17
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- How do humans decide which level of clarification to request?
- Can models identify information gaps without just guessing or refusing to answer?
- How does asymmetric information shape what to ask users first?
- What makes some clarifying questions more useful than others?
- How does uncertainty estimation drive computational resource allocation in models?
- How do experts decide which information matters for a specific audience?
- What structural changes enable agents to ask clarifying questions?
- Can reinforcement learning teach AI when to ask clarifying questions?
- What makes a clarifying question aligned with user interests versus structurally sound?
- Why do specific clarifying questions outperform generic requests for clarity?
- Can models learn to ask clarifying questions instead of making assumptions?
- Can we cheaply estimate which samples are currently most informative?
- Can imperfect uncertainty estimates still beat uniform oversight strategies?
- Can information-gain principles improve how we choose what to label?
- How does expressing uncertainty help models avoid the answer-or-abstain dilemma?
- Which types of clarifying questions actually help users versus wasting their time?
- How can models select the optimal question to ask given multiple uncertainties?
Related concepts in this collection 6
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Can models learn to ask clarifying questions instead of guessing?
Exploring whether large language models can be trained to detect incomplete queries and actively request missing information rather than hallucinating answers or refusing to respond. This matters because conversational agents today remain passive, responding only when prompted.
UoT provides the selection mechanism that proactive critical thinking needs: once missing information is detected, which question recovers it fastest
-
Which clarifying questions actually improve user satisfaction?
Not all clarification helps equally. This explores whether asking users to rephrase their needs works as well as asking targeted questions about specific information gaps.
information-gain criterion naturally selects specific-facet questions over generic rephrasing
-
Can AI agents communicate efficiently in joint decision problems?
When humans and AI must collaborate to solve optimization problems under asymmetric information, what communication patterns enable effective coordination? Current LLMs struggle with this—why?
UoT operationalizes the asymmetric information problem: simulate what the user might know, ask what most reduces the asymmetry
-
When should AI agents ask users instead of just searching?
Explores whether tool-enabled LLMs should probe users for clarification when uncertain, rather than silently chaining tool calls that drift from intent. Examines conversation analysis patterns as a formal alternative.
UoT provides the selection mechanism for which insert-expansion to use
-
Can dialogue planning balance fast responses with strategic depth?
Can a system use quick instinctive responses for familiar conversation contexts while activating deeper planning only when uncertainty demands it? This explores whether adaptive computation improves dialogue goal-reaching.
UoT's simulation loop could serve as the System 2 question-selection mechanism when uncertainty triggers MCTS planning
-
Can tree search replace human feedback in LLM training?
Explores whether Monte Carlo Tree Search can generate quality signals for self-improvement without expensive human annotations. Matters because annotation bottlenecks currently limit LLM scaling.
structural analogy: UoT's reward propagation across simulated futures parallels MCTS backpropagation of quality signals
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language Models
- Researchy Questions: A Dataset of Multi-Perspective, Decompositional Questions for LLM Web Agents
- TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning
- Beyond the Exploration-Exploitation Trade-off: A Hidden State Approach for LLM Reasoning in RLVR
- Knowledge or Reasoning? A Close Look at How LLMs Think Across Domains
- QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks?
- Chain-of-Thought Reasoning Without Prompting
- Deal, or no deal (or who knows)? Forecasting Uncertainty in Conversations using Large Language Models
Original note title
uncertainty-aware question selection via information gain simulates possible futures to determine the optimal next question to ask