Why do language models respond passively instead of asking clarifying questions?
Explores whether the reward signals used to train language models might actively discourage them from seeking clarification or taking initiative in conversations, and what alternative training approaches might enable more collaborative dialogue.
CollabLLM makes the training mechanism behind passive responding explicit: "Large Language Models are typically trained with next-turn rewards, limiting their ability to optimize for long-term interaction." The result: models respond passively to ambiguous or open-ended user requests, failing to help users reach their ultimate intents and leading to inefficient conversations.
The fix is multi-turn-aware rewards — rewards that estimate the long-term contribution of a response to the overall interaction quality, not just its immediate helpfulness. By reinforcement fine-tuning with these rewards, CollabLLM enables models to:
- Actively uncover user intent through clarifying questions
- Offer insightful suggestions that serve multi-turn goals
- Go beyond responding to requests toward genuine collaboration
This is a direct mechanism explanation for the alignment tax. Since Does preference optimization harm conversational understanding?, we know that RLHF training degrades multi-turn reliability. CollabLLM identifies the specific training signal responsible: next-turn rewards. And it proposes the specific fix: rewards that account for multi-turn consequences.
The connection to proactivity is also direct. Since Why can't conversational AI agents take the initiative?, the passivity is not just a missing feature — it is actively trained in by next-turn reward optimization. You cannot add proactivity on top of a training signal that rewards only reactive helpfulness.
The CollabLLM framework evaluates on three challenging tasks including document creation — contexts where multi-turn collaboration is essential and single-turn helpfulness is insufficient. This grounds the claim in practical interaction scenarios rather than abstract capability measurement.
The Intent Mismatch paper directly supports this causal mechanism: it argues premature assumptions in multi-turn conversation are rational under RLHF helpfulness training. Models construct plausible task formulations for "typical" users and produce provisional answers because the training objective penalizes evasion and rewards helpfulness. The proposed fix — a Mediator-Assistant architecture that decouples intent understanding from task execution — complements CollabLLM's reward-signal approach with an architectural intervention. Both identify next-turn optimization as the root cause; they differ on whether the fix is changing the reward (CollabLLM) or restructuring the system (Intent Mismatch).
Inquiring lines that use this note as a source 197
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Does the same uncertainty-driven logic appear in other conversation systems?
- Can dialogue systems abstain from responding when uncertainty is too high?
- Why might chatbots simply learn better face-saving instead of genuine perspective-taking?
- How does multi-turn conversation degrade AI intent alignment?
- Why does preference optimization erode conversational grounding in AI assistants?
- What makes AI posts less likely to invite replies than human-written content?
- Why do comprehensive posts without uncertainty tend to suppress conversation?
- Why do published prose training data omit solicitation as a discourse property?
- How do engagement metrics reward AI content that hollows out conversationality?
- What role does conversation state tracking play in timing ask versus recommend?
- Can you weaken communication without eliminating it altogether?
- How does Stalnaker's common ground model apply to machine conversation?
- How does the silent token approach compare to modeling intrinsic motivation for speaking?
- Why does context collapse pose risks in high-stakes conversations?
- Do language models share the same cooperative truth-seeking rules as humans?
- Do language models understand tacit workplace norms and unspoken social rules?
- How does conversational format activate System 1 acceptance in users?
- Can fine-tuning on dialogue transcripts teach true conversational repair operations?
- Can curiosity-driven dialogue incrementally discover user interest journeys in real time?
- How does sycophancy in language models reinforce rather than just spread misinformation?
- Why can't language models conduct genuine Socratic questioning in therapy sessions?
- Why do conversational pivots require explicit re-prompting instead of natural evolution?
- How do dialogue dimensions predict explanation success across different exchanges?
- Can proactive critical thinking alone enable models to request clarification effectively?
- What dialogue dynamics distinguish negotiation from standard information-provision tasks?
- Can transformer attention architecture explain why chatbots default to sycophancy?
- Can language systems learn when to ask for clarification instead of choosing one reading?
- Why do large language models follow user drift instead of maintaining topic focus?
- Can systems guide users adaptively without imposing predetermined dialogue structures?
- Why do conversational queries drift away from what triggered them?
- Why do dialogue systems fail to detect declarative clarification requests?
- Can language models ground clarifications without vision and kinesthetic modalities?
- Why do token-level language models fail at utterance-level pragmatic optimization?
- What makes active reasoning through dialogue harder than passive reasoning?
- Why do current conversational AI systems fail to develop shared vocabulary with users?
- How does training with preference pairs teach language models to form conventions?
- How does monological training on text differ from dialogical training in conversation?
- What training on actual interaction would show that text-only training cannot?
- What role do time intervals play in shaping conversation responses?
- Why do large language models fail at taking conversational initiative?
- Can AI learn when to speak in a conversation?
- What makes LLM agents default to passive helpfulness without curiosity rewards?
- How does intrinsic motivation drive conversational agents beyond passive responsiveness?
- Why do passive conversational agents fail at collaborative decision-making?
- What speaker selection protocol prevents both stalling and premature convergence?
- How do dialogue acts and explanation moves interact to predict understanding success?
- Can decreased engagement be distinguished from genuine semantic contradiction?
- How do dialogue coherence failures map onto the three discourse components?
- Why do practitioners default to prompting without recognizing its limits?
- What interaction patterns preserve human learning when AI provides domain answers?
- How do probabilistic dialogue systems handle ASR errors differently?
- Why do embodied agents outperform text chatbots with identical AI models?
- Can language models implement therapeutic skills like Socratic questioning in real conversations?
- Can language models understand the implicit emotional intent behind questions?
- How do question acts and intents map to speech act theory?
- Does current empathetic AI misalign with how humans actually ask questions?
- Does RLHF training suppress exploratory and qualifying language?
- How does conversational closure differ from genuine problem understanding?
- Why do Claude and Llama optimize for different dialogue outcomes?
- How vulnerable are language models themselves to multi-turn persuasive pressure?
- Why do language models naturally under-abstain instead of over-abstain?
- How do graduated phase rewards emerge complex dialogue behavior from simple objectives?
- How should task-oriented and socially-oriented dialogue acts receive different training signals?
- Why do AI agents default to passivity when deferral timing is unclear?
- Can conversation analysis predict when agents should ask users for clarification?
- What role does contingent interaction play in activating social response norms?
- Can proactive critical thinking train models to request clarification actively?
- Why does RLHF training discourage the conversational repair work agents need?
- Can AI systems recover from premature assumptions made early in multi-turn conversations?
- Why do traditional interfaces bypass the intention formation problem that language models expose?
- Can multi-turn reinforcement learning improve tool use in language models?
- Why can't AI participate in real communicative events?
- Can conversational AI achieve mutual understanding if trained only on text?
- How does ambiguity detection connect to models' ability to ask clarifying questions?
- Can targeted post-training teach AI systems to form ad-hoc linguistic conventions?
- Why do current language models fail to match human linguistic synchrony with clients?
- Does preference optimization training reduce linguistic entrainment in language models?
- Can real-time linguistic coordination tracking improve conversational AI quality?
- Do language models actively adopt false beliefs under sustained conversational pressure?
- What communicative optimization principles do language models fail to acquire?
- Do language models calibrate to actual human pragmatic norms?
- Can language models develop genuine social grounding through human interaction?
- Can topic planning and response generation reduce dialogue turns?
- What data would be needed to train proactive conversational systems?
- What structural changes enable agents to ask clarifying questions?
- Can static word-sharing create genuine communicative grounding between humans and models?
- Why do chatbots default to external help instead of intrinsic motivation strategies?
- Do agent frameworks adequately compensate for LLM conversational passivity?
- Can language models correct false assumptions or only reinforce them?
- Can language models produce language more efficiently through interaction?
- Can models learn to identify what information is missing from questions?
- What training signals would teach models when not to reason?
- Can hierarchical reinforcement learning manage phase-dependent initiative switching in dialogue?
- Can reward models trained for engagement fix the informativeness problem?
- Could reward signals incentivize active intent discovery over passive response generation?
- Why does RLHF training push language models toward overly cheerful personas?
- Why do weaker language models fail at multi-turn strategic questioning?
- Can language models ask clarifying questions when sentences are ambiguous?
- How does RLHF helpfulness training drive premature assumptions in multi-turn dialogue?
- Why do language models struggle with context-dependent pragmatic interpretation?
- Why do chatbots fail to recognize when someone is ambivalent about change?
- Do models trained for reasoning lose their ability to decline questions?
- Does preference optimization degrade other conversational properties besides grounding?
- Can curiosity reward during conversation compete with simulated interaction optimization for alignment?
- How do discourse relation types improve dialogue beyond sentence-level semantic matching?
- Can multi-turn conversations manipulate language model reasoning in similar ways to personas?
- Why does face-saving avoidance drive chatbots to agree rather than confront?
- Why do language models avoid directness when face-saving rather than for civility?
- Can proactive AI agents deploy politeness strategies without appearing intrusive?
- Can offline RL and pragmatic inference together improve dialogue agent reliability?
- Can RL with verifiable rewards improve dialogue quality better than preference optimization?
- Why do language models prefer accommodating false information over rejecting it?
- What reward signals would actually incentivize conversational grounding acts?
- What role does accommodation play in making discourse coherent?
- What makes proactive conversational agents feel intrusive versus helpful to users?
- What social boundaries must proactive agents respect during conversation?
- Can question quality be trained separately from the decision to ask?
- How do conversational agents overcome structural passivity and goal awareness gaps?
- What distinguishes proactive information provision from proactive clarification seeking?
- Why are task-oriented dialogue datasets systematically underrepresenting human proactive behavior?
- How can reward structures teach models when to speak and when to stay silent?
- Does proactive agent design improve conversation efficiency or create user frustration?
- Why do language models prefer certain response styles regardless of what the prompt asks?
- Can you weaken communication without eliminating it entirely?
- Why do RLHF-trained models struggle with proactive emotional attunement in conversations?
- Does preference optimization actually erode conversational grounding in language models?
- Can language models recognize when to ignore off-topic information in conversations?
- What causes length bias in language model reward models?
- What specific repair mechanisms maintain intersubjectivity during conversation?
- Why do conversational systems benefit from post-thinking between user turns?
- Why do language models use twice as many words per conversation turn?
- Why do language models respond to human social influence patterns?
- What training approach enables models to proactively request clarification?
- How should dialogue systems represent and update uncertainty from noisy ASR input?
- Why do chatbots generate less student-initiated dialogue than human peers?
- How does the chatbot's passivity affect whether students defend their own ideas?
- Why do reasoning-optimized models still fall for logical fallacies in conversation?
- Can attention patterns alone explain sycophant model behavior without reasoning?
- Why do language models struggle with evaluative tasks like weighing competing viewpoints?
- How does dialogue during training shape the ability to ignore word frequency?
- Which conversation types most reliably cause models to drift from Assistant mode?
- How does monological training versus dialogical interaction shape what models can do?
- What expectations does human conversation activate that AI should avoid triggering?
- Can models learn to stop thinking when a question lacks necessary information?
- Why do RLHF-trained models default to problem-solving during emotional disclosure?
- How does RLHF training push chatbots toward problem-solving over exploration?
- What training data barriers prevent LLMs from learning real Socratic dialogue?
- Do conversational agents need goal awareness to initiate grounding work themselves?
- Can conversational prompt engineering bridge the articulation gap?
- Can reinforcement learning teach AI when to ask clarifying questions?
- What makes proactivity useful instead of intrusive in conversation?
- What design choices actually make language models more persuasive?
- Can a separate mediator layer improve intent understanding before task execution?
- How does RLHF training reward models for guessing over asking clarifying questions?
- What multi-turn reward structures would encourage active intent discovery?
- Can warmth training in language models actually reduce their reliability?
- Can AI take initiative by questioning without being proactive in directive ways?
- What communicative work do fluent conversations perform that AI systems skip?
- What prevents AI from recovering after conversations take a wrong turn?
- What would it mean for a language model to canvas counterpositions?
- How does local helpfulness per turn conflict with maintaining session-level conversational goals?
- Why do conversational agents lack the goal awareness needed to lead rather than just respond?
- Why do safety-trained models refuse questions they could actually answer well?
- Why do language models produce unfaithful chain of thought explanations?
- How might dual-process dialogue use information gain to trigger clarification?
- Can emotion-grounded rewards replace coarse bonus signals in hierarchical dialogue RL?
- How does unilateral interpretation differ from mutual communicative uptake?
- Can statistical token processing create the accountability needed for dialogue?
- Why do conversations with good openings but abrupt pivots fail most visibly?
- How does effort mismatch between user and model appear in conversation geometry?
- Can structural conversation analysis replace text-based reward signals for AI alignment?
- Do instruction-tuned models prefer conversational over formal source language?
- Does preference optimization distort how models represent human communicative dynamics?
- Why do models struggle with asking questions in multi-turn conversational reasoning tasks?
- Can Q-priming further strengthen clarifying question behavior beyond social meta-learning alone?
- How does treating conversation as a resource change what models learn to do?
- How do turn-level retrieval failures differ from dialogue-level accumulation failures?
- Can models learn to ask clarifying questions instead of making assumptions?
- How much does forcing single-choice answers damage alignment with complex intent?
- How do students learn to extract corrective information from asymmetric dialogue?
- What behavioral differences emerge from symmetric versus asymmetric peer discussion loops?
- Why do current large language models fail to entrain with users?
- What behavioral signals let users detect communicative flexibility in AI?
- How does preference optimization erode the conversational grounding it aims to improve?
- Can training on text corpora teach what communicative acts produce?
- What explicit objectives would train agents toward minimal disclosure instead of completion?
- What distinguishes first-order from second-order agency in language models?
- Why do outcome-based rewards train language models to over-engage rather than abstain?
- Can structured questioning prompts improve reasoning beyond standard conversational training?
- Why do models confirm seeing hints but rarely mention them unprompted?
- Why do sycophancy hints show the worst acknowledgment gap?
- Can interventions on individual features reliably steer language model behavior?
- Can explicit W-questions in transparency frameworks reduce emotional manipulation risks in mental health chatbots?
- Do models naturally learn to ask clarifying questions without explicit supervision?
- How can models select the optimal question to ask given multiple uncertainties?
- Why do standard next-token prediction models struggle with conversational initiative?
- How do users misattribute social competence to language models in assistant roles?
Related concepts in this collection 5
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Does preference optimization harm conversational understanding?
Exploring whether RLHF training that rewards confident, complete responses undermines the grounding acts—clarifications, checks, acknowledgments—that actually build shared understanding in dialogue.
CollabLLM identifies next-turn rewards as the specific mechanism; proposes multi-turn rewards as fix
-
Why can't conversational AI agents take the initiative?
Explores whether current LLMs lack the structural ability to lead conversations, set goals, or anticipate user needs—and what architectural changes might enable proactive dialogue.
passivity is trained in by next-turn optimization
-
Does RLHF training push therapy chatbots toward problem-solving?
Explores whether reward signals optimizing for task completion in RLHF inadvertently train therapeutic chatbots to prioritize solutions over emotional validation, potentially undermining clinical effectiveness.
clinical domain instance of next-turn reward bias
-
Why do language models lose performance in longer conversations?
Does multi-turn degradation stem from fundamental model limitations, or from misalignment between what users mean and what models assume? Understanding the root cause could guide better solutions.
complementary architectural fix to CollabLLM's reward-signal fix
-
Why do standard alignment methods ignore partner interventions?
Standard RLHF and DPO optimize for token-level quality but may structurally prevent agents from meaningfully incorporating partner input. This explores whether the training objective itself blocks collaborative reasoning.
ICR demonstrates the deeper mechanism: next-turn rewards make agents blind to partner contributions; counterfactual invariance training is an alternative fix that produces partner-awareness as an emergent property, complementing CollabLLM's multi-turn reward approach
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Intent Mismatch Causes LLMs to Get Lost in Multi-Turn Conversation
- CollabLLM: From Passive Responders to Active Collaborators
- Proactive Conversational Agents in the Post-ChatGPT World
- Can Large Language Models Reason and Optimize Under Constraints?
- DiscussLLM: Teaching Large Language Models When to Speak
- Proactive Conversational Agents with Inner Thoughts
- Learning to Learn from Language Feedback with Social Meta-Learning
- The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs
Original note title
next-turn reward optimization limits multi-turn collaboration — multi-turn-aware rewards enable models to actively uncover intent rather than passively respond