SYNTHESIS NOTE
Conversational AI and Personalization Psychology, Society, and Alignment

Why do language models respond passively instead of asking clarifying questions?

Explores whether the reward signals used to train language models might actively discourage them from seeking clarification or taking initiative in conversations, and what alternative training approaches might enable more collaborative dialogue.

Synthesis note · 2026-02-22 · sourced from Conversation Agents
Why do AI agents fail to take initiative? What kind of thing is an LLM really? How should researchers navigate LLM reasoning research?

CollabLLM makes the training mechanism behind passive responding explicit: "Large Language Models are typically trained with next-turn rewards, limiting their ability to optimize for long-term interaction." The result: models respond passively to ambiguous or open-ended user requests, failing to help users reach their ultimate intents and leading to inefficient conversations.

The fix is multi-turn-aware rewards — rewards that estimate the long-term contribution of a response to the overall interaction quality, not just its immediate helpfulness. By reinforcement fine-tuning with these rewards, CollabLLM enables models to:

This is a direct mechanism explanation for the alignment tax. Since Does preference optimization harm conversational understanding?, we know that RLHF training degrades multi-turn reliability. CollabLLM identifies the specific training signal responsible: next-turn rewards. And it proposes the specific fix: rewards that account for multi-turn consequences.

The connection to proactivity is also direct. Since Why can't conversational AI agents take the initiative?, the passivity is not just a missing feature — it is actively trained in by next-turn reward optimization. You cannot add proactivity on top of a training signal that rewards only reactive helpfulness.

The CollabLLM framework evaluates on three challenging tasks including document creation — contexts where multi-turn collaboration is essential and single-turn helpfulness is insufficient. This grounds the claim in practical interaction scenarios rather than abstract capability measurement.

The Intent Mismatch paper directly supports this causal mechanism: it argues premature assumptions in multi-turn conversation are rational under RLHF helpfulness training. Models construct plausible task formulations for "typical" users and produce provisional answers because the training objective penalizes evasion and rewards helpfulness. The proposed fix — a Mediator-Assistant architecture that decouples intent understanding from task execution — complements CollabLLM's reward-signal approach with an architectural intervention. Both identify next-turn optimization as the root cause; they differ on whether the fix is changing the reward (CollabLLM) or restructuring the system (Intent Mismatch).

Inquiring lines that use this note as a source 197

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 5

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
18 direct connections · 140 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

next-turn reward optimization limits multi-turn collaboration — multi-turn-aware rewards enable models to actively uncover intent rather than passively respond