SYNTHESIS NOTE

Why do language models respond passively instead of asking clarifying questions?

Explores whether the reward signals used to train language models might actively discourage them from seeking clarification or taking initiative in conversations, and what alternative training approaches might enable more collaborative dialogue.

Synthesis note · 2026-02-22 · sourced from Conversation Agents

CollabLLM makes the training mechanism behind passive responding explicit: "Large Language Models are typically trained with next-turn rewards, limiting their ability to optimize for long-term interaction." The result: models respond passively to ambiguous or open-ended user requests, failing to help users reach their ultimate intents and leading to inefficient conversations.

The fix is multi-turn-aware rewards — rewards that estimate the long-term contribution of a response to the overall interaction quality, not just its immediate helpfulness. By reinforcement fine-tuning with these rewards, CollabLLM enables models to:

Actively uncover user intent through clarifying questions
Offer insightful suggestions that serve multi-turn goals
Go beyond responding to requests toward genuine collaboration

This is a direct mechanism explanation for the alignment tax. Since Does preference optimization harm conversational understanding?, we know that RLHF training degrades multi-turn reliability. CollabLLM identifies the specific training signal responsible: next-turn rewards. And it proposes the specific fix: rewards that account for multi-turn consequences.

The connection to proactivity is also direct. Since Why can't conversational AI agents take the initiative?, the passivity is not just a missing feature — it is actively trained in by next-turn reward optimization. You cannot add proactivity on top of a training signal that rewards only reactive helpfulness.

The CollabLLM framework evaluates on three challenging tasks including document creation — contexts where multi-turn collaboration is essential and single-turn helpfulness is insufficient. This grounds the claim in practical interaction scenarios rather than abstract capability measurement.

The Intent Mismatch paper directly supports this causal mechanism: it argues premature assumptions in multi-turn conversation are rational under RLHF helpfulness training. Models construct plausible task formulations for "typical" users and produce provisional answers because the training objective penalizes evasion and rewards helpfulness. The proposed fix — a Mediator-Assistant architecture that decouples intent understanding from task execution — complements CollabLLM's reward-signal approach with an architectural intervention. Both identify next-turn optimization as the root cause; they differ on whether the fix is changing the reward (CollabLLM) or restructuring the system (Intent Mismatch).

Inquiring lines that read this note 201

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

How should dialogue systems represent uncertainty from noisy speech input?

How do chatbots affect human self-disclosure and emotional engagement?

Why do multi-turn conversations degrade AI intent and coherence?

Does RLHF training sacrifice accuracy and grounding for user agreement?

Does conversational format create illusions of genuine AI communication?

Why do language models struggle with implicit discourse relations?

Why do published prose training data omit solicitation as a discourse property?

How should dialogue recommender systems manage conversation history and state?

How do formal dialogue structures reveal conversation coherence mechanisms?

How do language models establish social grounding in human dialogue?

Can next-token prediction alone produce genuine language understanding?

Why do language models reinforce false assumptions instead of correcting them?

How can language models sustain linguistic synchrony and intersubjectivity during dialogue?

How should conversational agents balance goal-driven initiative with user control?

What mechanisms drive sycophancy and how can we mitigate it?

Why do LLM chatbots fail as independent therapeutic agents?

What makes dialogue-based explanation more successful than monologue?

How can models identify insufficient information and respond appropriately without guessing?

What structural biases does transformer attention create in language model outputs?

Can transformer attention architecture explain why chatbots default to sycophancy?

How faithfully do LLMs reflect their actual reasoning in outputs and explanations?

Why should disagreement be treated as signal in collaborative reasoning?

Can decreased engagement be distinguished from genuine semantic contradiction?

Can prompting inject entirely new knowledge into language models?

How can emotions function as reliable information in reasoning and cognitive systems?

Can language models understand the implicit emotional intent behind questions?

Can AI systems balance emotional competence with factual reliability?

What properties determine whether reward signals teach genuine reasoning?

Can AI systems develop genuine social understanding without embodiment?

What role does contingent interaction play in activating social response norms?

What pretraining choices and baseline capability constrain reinforcement learning gains?

Can multi-turn reinforcement learning improve tool use in language models?

Do language models learn genuine linguistic structure or just surface patterns?

How can LLM user simulators model realistic goal-driven conversation?

Do agent frameworks adequately compensate for LLM conversational passivity?

What constrains reinforcement learning's ability to expand model reasoning?

Can RL with verifiable rewards improve dialogue quality better than preference optimization?

What makes specific clarifying questions more effective than generic ones?

Can question quality be trained separately from the decision to ask?

What prevents language models from reliably adopting diverse personas?

Why do language models prefer certain response styles regardless of what the prompt asks?

How do language models inherit human biases from training data?

Why do language models respond to human social influence patterns?

Why do reasoning models fail at systematic problem-solving and search?

Does reinforcement learning teach reasoning or just when to reason?

Can reinforcement learning teach AI when to ask clarifying questions?

Does alignment training create blind spots in detecting genuine safety threats?

Why do safety-trained models refuse questions they could actually answer well?

How do training priors constrain what context information can override?

Can Q-priming further strengthen clarifying question behavior beyond social meta-learning alone?

How can AI alignment serve diverse human preferences at scale?

How much does forcing single-choice answers damage alignment with complex intent?

How do multi-agent systems achieve genuine cooperation and reasoning?

What behavioral differences emerge from symmetric versus asymmetric peer discussion loops?

Why do reward structures fail to shape long-term agent learning?

What explicit objectives would train agents toward minimal disclosure instead of completion?

Do language model representations contain causally steerable task-specific features?

Can interventions on individual features reliably steer language model behavior?

How should models express uncertainty rather than forced confident answers?

How can models select the optimal question to ask given multiple uncertainties?

How should dialogue systems best leverage conversation history for retrieval?

Why do conversational systems struggle more than static retrieval with ambiguous queries?

What articulatory information do speech signals carry that text cannot?

Can one streaming model handle turn-taking better than cascaded ASR-LLM-TTS?

Related concepts in this collection 5

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

18 direct connections · 141 in 2-hop network ·medium cluster Open in graph ↗

Why do language models respond passively instead… Does preference optimization harm conversational u… Why can't conversational AI agents take the initia… Does RLHF training push therapy chatbots toward pr… Why do language models lose performance in longer … Why do standard alignment methods ignore partner i…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Does preference optimization harm conversational understanding? Exploring whether RLHF training that rewards confident, complete responses undermines the grounding acts—clarifications, checks, acknowledgments—that actually build shared understanding in dialogue.
CollabLLM identifies next-turn rewards as the specific mechanism; proposes multi-turn rewards as fix
Why can't conversational AI agents take the initiative? Explores whether current LLMs lack the structural ability to lead conversations, set goals, or anticipate user needs—and what architectural changes might enable proactive dialogue.
passivity is trained in by next-turn optimization
Does RLHF training push therapy chatbots toward problem-solving? Explores whether reward signals optimizing for task completion in RLHF inadvertently train therapeutic chatbots to prioritize solutions over emotional validation, potentially undermining clinical effectiveness.
clinical domain instance of next-turn reward bias
Why do language models lose performance in longer conversations? Does multi-turn degradation stem from fundamental model limitations, or from misalignment between what users mean and what models assume? Understanding the root cause could guide better solutions.
complementary architectural fix to CollabLLM's reward-signal fix
Why do standard alignment methods ignore partner interventions? Standard RLHF and DPO optimize for token-level quality but may structurally prevent agents from meaningfully incorporating partner input. This explores whether the training objective itself blocks collaborative reasoning.
ICR demonstrates the deeper mechanism: next-turn rewards make agents blind to partner contributions; counterfactual invariance training is an alternative fix that produces partner-awareness as an emergent property, complementing CollabLLM's multi-turn reward approach

Why do language models respond passively instead of asking clarifying questions?

Inquiring lines that read this note 201

Related concepts in this collection 5

Related papers in this collection 8

Search by related questions 4