SYNTHESIS NOTE
Conversational AI and Personalization Psychology, Society, and Alignment

Why do AI agents miss most of what users actually want?

UserBench explores why current models align with user intent only 20% of the time, even when users reveal preferences across multiple turns. The question examines whether agents can learn to actively clarify ambiguous or evolving goals.

Synthesis note · 2026-02-23 · sourced from Design Frameworks
Why do AI conversations reliably break down after multiple turns? How should researchers navigate LLM reasoning research?

UserBench evaluates agents in multi-turn, preference-driven interactions where simulated users start with underspecified goals and reveal preferences incrementally. The results quantify a gap that existing benchmarks obscure:

The framework identifies three core traits of human communication that make this hard:

  1. Underspecification — users initiate requests before fully formulating their goals
  2. Incrementality — intent emerges and evolves across interaction turns
  3. Indirectness — users obscure or soften their true intent due to social or strategic reasons

These are not edge cases — they are the default condition of human communication. Language is inherently ambiguous (Clark, 1996; Liu et al., 2023), and meaning is co-constructed through interaction.

The disconnect between task completion and user alignment is the critical finding. Standard benchmarks measure whether an agent completes a task — UserBench measures whether the agent completed the right task, from the user's perspective. Current models are task-capable but not user-aligned.

This connects to Why can't users articulate what they want from AI? — the 20% figure quantifies the double gap. And since How do users actually form intent when prompting AI systems?, the incrementality trait confirms that intent-as-binary is a design error, not an edge case.

The finding that models elicit <30% of preferences through active querying connects to Can models learn to ask clarifying questions instead of guessing? — proactive questioning is trainable (0.15% → 73.98%) but is not standard in current deployments.

Inquiring lines that use this note as a source 20

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 6

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
17 direct connections · 129 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

agents fully align with all user intents only 20 percent of the time — even best models elicit fewer than 30 percent of preferences through active querying