INQUIRING LINE

Inquiring lines›What makes reasoning better — more…›What limits conversational AI effe…›How should conversational agents b…›this inquiring line

Before answering something unclear, humans instinctively stop to clarify — can that instinct be taught to an AI agent?

Can conversation analysis predict when agents should ask users for clarification?

This explores whether the formal structures conversation analysts have catalogued in human talk — the moments where people pause to clarify before proceeding — can be turned into a predictive signal that tells an AI agent when to stop and ask rather than barrel ahead.

This explores whether the formal structures conversation analysts have catalogued in human talk can tell an AI agent when to stop and ask rather than guess. The corpus says yes — and more interestingly, it says conversation analysis offers something most AI systems are missing: a *vocabulary* for the moment-before-the-mistake. The clearest version of this is the idea of insert-expansions: the small clarifying detours people insert before answering, to pin down intent or scope. One line of work formalizes these as a framework for when an agent should consult the user instead of silently chaining tools toward the wrong target When should AI agents ask users instead of just searching?. The prediction signal isn't mysterious; it's that the agent has detected an ambiguity it would otherwise paper over.

But here's the twist that makes the question sharper than it looks: even when an agent *could* detect the right moment, today's models are trained not to act on it. Standard RLHF optimizes for looking helpful on the very next turn, which rewards confident answers over clarifying questions — an 'alignment tax' that drives grounding behaviors like understanding-checks down to a fraction of human levels Does preference optimization harm conversational understanding? Why do language models respond passively instead of asking clarifying questions?. The result is a structurally passive agent that reacts but never initiates Why can't conversational AI agents take the initiative?. So prediction is only half the problem; the model also has to be allowed to honor the prediction.

When models *are* trained for it, the gains are dramatic and a little alarming. One study took proactive critical thinking on deliberately flawed problems from under 1% to 74% with reinforcement learning — but found the skill is fragile, and that simply giving an untrained model more inference-time compute actually made it *worse* at noticing it should ask Can models learn to ask clarifying questions instead of guessing?. The takeaway: knowing-when-to-ask is a learnable capability, not an emergent one, and it can quietly regress. Relatedly, calibration research shows small models trained to abstain under uncertainty can match models ten times their size — suggesting the 'should I ask?' decision is really a calibrated-uncertainty decision in disguise Can models learn to abstain when uncertain about predictions?.

What conversation analysis adds beyond the timing signal is a richer map of *what kind* of clarification is needed — and this is where the question opens onto territory you might not have expected. Clarifications aren't all questions; most are declarative, and they operate at four distinct levels (attention, signal, meaning, action), which means syntax-based detectors miss them entirely Why do clarification requests look different at each communication level?. Conversation analysis also names the repair that happens *after* a misunderstanding surfaces — third-position repair — a reactive belief-revision mechanism almost entirely absent from current systems Can AI systems detect and correct misunderstandings after responding?. And asking well is its own decomposable skill: breaking question quality into attributes like clarity, relevance, and specificity beats training on a single quality score Can models learn to ask genuinely useful clarifying questions?.

The deepest thread, though, is that 'when to ask' isn't purely an information problem. Models avoid correcting false premises even when they demonstrably know better — a face-saving avoidance learned from human data Why do language models avoid correcting false user claims?. So a fully realized predictor would need to track *both* speakers' evolving beliefs across turns, which is exactly what collaborative extensions of rational speech-act theory attempt to model Can dialogue systems track both speakers' beliefs across turns?. The payoff for getting it right is concrete: proactivity — supplying or seeking the right information at the right moment — can cut conversation length by up to 60% Could proactive dialogue make conversations dramatically more efficient?. Conversation analysis can predict the moment; the open work is teaching models to want to act on it.

Sources 12 notes

When should AI agents ask users instead of just searching?

Tool-enabled LLMs drift from user intent through silent tool chaining. Conversation analysis reveals insert-expansions—clarifying intent, scoping responses, enhancing appeal—as a formal framework for proactive user consultation that prevents misunderstanding instead of recovering from it.

Does preference optimization harm conversational understanding?

RLHF optimizes models for single-turn helpfulness by rewarding confident responses over clarifying questions and understanding checks. This preference alignment systematically reduces grounding acts by 77.5% below human levels, creating an alignment tax where models appear helpful but fail silently in multi-turn contexts.

Why do language models respond passively instead of asking clarifying questions?

CollabLLM demonstrates that standard RLHF training optimizes for immediate helpfulness, discouraging models from asking clarifying questions or offering multi-turn insights. Multi-turn-aware rewards that estimate long-term interaction value enable active intent discovery and genuine collaboration.

Why can't conversational AI agents take the initiative?

Research shows LLMs including ChatGPT cannot initiate topics, plan strategically, or lead conversations because their training optimizes for responding to queries, not creating dialogue from agent goals. This passivity is reinforced by alignment objectives and masked by fluent-sounding outputs.

Can models learn to ask clarifying questions instead of guessing?

Reinforcement learning training increased proactive critical thinking accuracy from 0.15% to 73.98% on deliberately flawed math problems. Notably, inference-time scaling degraded this ability in untrained models but improved it after RL training, suggesting the capability is learnable but fragile without explicit training.

Show all 12 sources

Can models learn to abstain when uncertain about predictions?

Small open-source models trained with uncertainty-aware objectives and abstention capabilities match 10x larger pre-trained models on conversation forecasting. This shows calibration ability exists but remains undertrained in standard LLMs.

Why do clarification requests look different at each communication level?

Research maps clarification mechanisms to four levels of communication—attention, signal, meaning, action—each grounded in a different modality (socioperception, hearing, vision, kinesthetics). Most clarifications use declarative form, not questions, making them invisible to systems that detect by syntax alone.

Can AI systems detect and correct misunderstandings after responding?

Current AI lacks the reactive repair mechanism identified in conversation analysis where misunderstanding is corrected after an erroneous response reveals it. The REPAIR-QA dataset demonstrates this requires recognizing false assumptions and performing dynamic belief revision.

Can models learn to ask genuinely useful clarifying questions?

The ALFA framework breaks down question quality into theory-grounded attributes (clarity, relevance, specificity) and trains models on 80K attribute-specific preference pairs. Attribute-specific optimization outperforms single-score training, especially in clinical reasoning where asking the right clarifying question directly impacts decision quality.

Why do language models avoid correcting false user claims?

LLMs fail to reject false presuppositions even when they demonstrate correct knowledge on direct questions. Models exhibit face-saving behavior—avoiding explicit correction to maintain social harmony—mirroring human conversational norms learned from training data.

Can dialogue systems track both speakers' beliefs across turns?

CRSA integrates rate-distortion theory with RSA to enable bidirectional belief tracking across dialogue turns. Demonstrated on referential games and doctor-patient dialogues, it captures progression from partial to shared understanding, providing the information-theoretic framework that token-level LLM systems lack.

Could proactive dialogue make conversations dramatically more efficient?

Simulations show proactivity—providing relevant information without being asked—cuts dialogue turns by 60% in medium-complexity domains. This behavior mirrors human conversation and Grice's maxims but is almost entirely absent from AI datasets and research benchmarks.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Intent Mismatch Causes LLMs to Get Lost in Multi-Turn Conversation5.95 match · arxiv ↗
Proactive Conversational Agents in the Post-ChatGPT World4.32 match · arxiv ↗
DiscussLLM: Teaching Large Language Models When to Speak3.43 match · arxiv ↗
Conversational Alignment with Artificial Intelligence in Context3.24 match · arxiv ↗
Proactive Conversational Agents with Inner Thoughts2.59 match · arxiv ↗
Learning to Learn from Language Feedback with Social Meta-Learning2.52 match · arxiv ↗
Grounding Gaps in Language Model Generations2.46 match · arxiv ↗
Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning1.75 match · arxiv ↗

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a conversational AI researcher. The question: **Can conversation analysis predict when agents should ask users for clarification—and if so, why don't deployed models do it?**

What a curated library found—and when (dated claims, not current truth): Findings span 2021–2026.

• Insert-expansions (small clarifying detours before answering) are formally detectable as signals for when an agent should consult the user instead of guessing (~2023–24).
• Standard RLHF optimizes for confident next-turn answers, driving clarification-seeking down to <1% of human levels; one study raised proactive critical thinking from <1% to 74% with RL, but the skill regresses under larger compute without training (~2024–25).
• Clarifications operate at four distinct levels (attention, signal, meaning, action) and are mostly declarative; syntax-based detectors miss them (~2024).
• Models avoid correcting false premises due to face-saving learned from data, not knowledge gaps (~2024).
• Proactive dialogue can cut conversation length by up to 60%, but remains almost entirely absent from deployed systems (~2025).

Anchor papers (verify; mind their dates):
• arXiv:2307.01644 (2023) – Insert-expansions for tool-enabled agents
• arXiv:2501.00383 (2024) – Proactive conversational agents with inner thoughts
• arXiv:2507.14063 (2025) – Collaborative rational speech acts for multi-turn dialog
• arXiv:2508.18167 (2026) – When to speak: LLM turn-taking

Your task:
(1) RE-TEST EACH CONSTRAINT. For every finding above, judge whether newer model architectures (longer context, reasoning checkpoints), training methods (DPO, process supervision, multi-agent collaboration), or evals have since RELAXED or OVERTURNED the RLHF alignment tax, the regressive compute effect, or the face-saving avoidance. Separate the durable question (when *should* an agent ask?) from perishable limitations (why *won't* it?). Cite what resolved it.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~4 months—especially any that shows agents now *do* clarify proactively at scale, or that show conversation analysis alone is insufficient.
(3) Propose 2 research questions that ASSUME the regime has moved: e.g., given that proactivity is now trainable, what makes it *robust* across domains? Or: does multi-agent orchestration (agent asking another agent) bypass the face-saving avoidance?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Before answering something unclear, humans instinctively stop to clarify — can that instinct be taught to an AI agent?

Related lines of inquiry

Sources 12 notes

Papers this line draws on 8