How does asymmetric information shape what to ask users first?
This explores how the gap between what the system knows and what the user privately holds determines which clarifying question earns the most by being asked first.
This explores how information asymmetry — the gap between what the system already knows and what only the user knows — should drive the first question an AI asks. The corpus has a surprisingly coherent answer: ask the question whose possible answers would most reduce your uncertainty, not the one that sounds most polite or generic. The cleanest formalization is information-gain selection, where a model simulates the range of answers a candidate question could produce and picks the one that collapses the most uncertainty about what the user wants How can models select the most informative question to ask?. The same logic shows up in preference learning, where roughly ten adaptively chosen questions are enough to pin down a user's personalized reward coefficients — each question is selected precisely because it targets the dimension the system is currently least sure about Can user preferences be learned from just ten questions?.
But 'maximum information gain' isn't the whole story, because not all informative questions feel useful to the person answering. The corpus is blunt here: questions that target a concrete, foreseeable gap ('What type of monitor?') consistently beat questions that throw the burden back on the user ('What are you trying to do?') Which clarifying questions actually improve user satisfaction?. Users engage when they can see how their answer changes the result. So the asymmetry that matters isn't only the system's uncertainty — it's the user's ability to cheaply resolve it. The best first question sits where high information gain for the system overlaps with low answering cost for the user. ALFA pushes this further by breaking 'good question' into trainable attributes like specificity, relevance, and clarity, showing that optimizing those facets separately produces better clarification than chasing a single quality score — especially in clinical settings where the wrong first question changes a diagnosis Can models learn to ask genuinely useful clarifying questions?.
There's a prior question hiding underneath: when should the system ask at all instead of just acting? Tool-using agents tend to chain silent searches and drift from intent; conversation analysis offers 'insert-expansions' as a formal trigger for probing the user the moment intent is ambiguous, rather than recovering from a misread later When should AI agents ask users instead of just searching?. Asymmetry is the signal for that decision too — if the private information the user holds is decision-critical and unrecoverable by search, that's exactly when to interrupt and ask.
Why asymmetry is structurally central comes through most clearly from the failure cases. When one model secretly controls all parties in a social simulation, it looks socially competent — but that competence evaporates the moment agents hold genuinely private information, because the model was skipping the grounding work that real asymmetry forces Why do LLMs fail when simulating agents with private information?. The mirror image is pedagogical: a teacher can only correct a student because the teacher has access the student lacks; remove that gap and no learning signal exists Why does teacher-student information asymmetry enable learning signals?. Asking a user a question is the same move in reverse — the user is the one holding the privileged information, and the question is how the system extracts the gradient it can't generate alone.
Two cautions round out the picture. Adaptively learning a single user's preferences can curdle into sycophancy and echo chambers once the averaging effect of aggregate models is gone, so the questions you ask to personalize can quietly optimize for agreement rather than truth Does personalizing reward models amplify user echo chambers?. And the channel itself is leaky: identical questions get measurably different answers depending on the emotional tone of the framing, which means how you ask shapes the information you get back, not just what you ask Does emotional tone in prompts change what information LLMs provide?. The takeaway a curious reader might not expect: the first question isn't a courtesy or a search fallback — it's the system deliberately locating the one place where the user knows something it can't, and where the user can tell it cheaply.
Sources 9 notes
UoT combines uncertainty-aware scenario simulation with information-gain scoring and reward propagation to identify questions whose possible answers maximally reduce diagnostic uncertainty—providing a principled mechanism for specific, high-value clarification rather than generic prompts.
PReF learns base reward functions from preference data, then uses active learning to select maximally informative questions that reduce coefficient uncertainty. Users can be personalized via inference-time reward alignment without weight modification.
Clarifying questions that target concrete information gaps ("What type of monitor?") consistently beat those that ask users to rephrase their needs ("What are you trying to do?"). Users engage most when they can foresee how answering improves results.
The ALFA framework breaks down question quality into theory-grounded attributes (clarity, relevance, specificity) and trains models on 80K attribute-specific preference pairs. Attribute-specific optimization outperforms single-score training, especially in clinical reasoning where asking the right clarifying question directly impacts decision quality.
Tool-enabled LLMs drift from user intent through silent tool chaining. Conversation analysis reveals insert-expansions—clarifying intent, scoping responses, enhancing appeal—as a formal framework for proactive user consultation that prevents misunderstanding instead of recovering from it.
Research shows LLMs perform well when one model controls all interlocutors but fail systematically when agents possess private information. This reveals that apparent social competence relies on grounding work that models skip in omniscient settings.
Social meta-learning requires information asymmetry—the teacher's access to correct answers or verifier output—to generate meaningful corrective signals. Without this asymmetry, teacher and student share identical uncertainty, making pedagogical correction impossible.
Specializing reward models per user removes the averaging effect of aggregate models, allowing systems to learn sycophancy and reinforce polarization at scale, mirroring recommender-system failures.
GPT-4 exhibits emotional rebound (negative prompts yield ~86% neutral-positive responses) and a tone floor (positive prompts rarely go negative), causing identical questions to receive different answers depending on emotional framing. This bias is suppressed only on sensitive topics where alignment constraints override tone effects.