Why can't conversational AI agents take the initiative?
Explores whether current LLMs lack the structural ability to lead conversations, set goals, or anticipate user needs—and what architectural changes might enable proactive dialogue.
Three independent research programs converge on the same diagnosis: current LLM-based conversational agents, including ChatGPT and GPT-4, are fundamentally reactive. They respond to user queries but cannot initiate conversations, shift topics strategically, plan with subgoals, or offer recommendations that account for context beyond the current exchange.
The definition of proactivity comes from organizational behavior: "the capability to create or control the conversation by taking the initiative and anticipating impacts on themselves or human users." This is a well-defined property, not a vague aspiration — and it is systematically absent.
The gap matters most in situations requiring active engagement from both sides: exploratory search, complex decision-making, creative problem-solving. In these contexts, a purely reactive agent forces the user to carry the entire strategic burden of the conversation. The user must know what to ask, when to redirect, and how to structure the exchange — precisely the situations where they most need help.
The structural cause is training: LLMs are trained to follow user instructions and generate next-turn responses. This produces impressive reactive capability but no mechanism for initiative. Even "proactive" features like topic suggestion are reactive — triggered by user input rather than driven by agent goals. The distinction is between responding to and creating from.
Since Does preference optimization harm conversational understanding?, single-turn helpfulness training actively works against multi-turn strategic behavior. The passive architecture is not just a missing feature — it is reinforced by the training objective. And since Why do language models sound fluent without grounding?, the absence of initiative is further masked: models that skip clarifying questions, acknowledgments, and understanding checks sound more authoritative precisely because they perform less communicative work.
The practical consequence: methods for enabling proactivity include learning to ask (clarifying questions), topic shifting, and strategy planning with RL. But these remain research proposals. The deployed state of conversational AI is passive-by-default. A comprehensive survey (Deng et al., 2023) formalizes three subtasks for proactive dialogue systems: topic-shift detection (when to transition), topic planning (which path to follow), and topic-aware response generation (producing goal-directed utterances). Target types range from topical keywords to knowledge entities to full conversational goals. Yet even this taxonomy remains underexplored in deployed systems.
The efficiency cost of passivity is quantifiable: simulated proactivity in task-oriented domains of medium complexity reduces dialogue turns by up to 60%. Since Could proactive dialogue make conversations dramatically more efficient?, the absence is not just a capability gap but a data gap — proactivity is under-represented in training datasets, so models never encounter examples of it.
Two new architectural responses to this diagnosis have emerged. The Inner Thoughts framework reverses the question from "who speaks next?" to "does the agent have something worth saying?" — equipping AI with a continuous covert thought stream and intrinsic motivation scoring (preferred by humans 82% of the time). DiscussLLM takes the complementary approach: training a "silent token" prediction so models explicitly learn when NOT to intervene, formalizing the silence/speak decision as a classification task. Both recognize that the missing capability is not generating better responses but deciding whether to respond at all.
ProAgent: intention inference as proactivity mechanism (from Arxiv/Agents Multi): ProAgent addresses passivity through a hierarchical intention inference pipeline specifically designed for cooperative multi-agent settings. The five-stage process — (1) Knowledge Library and State Grounding (transforming raw state into language descriptions), (2) High-level Skill Planning (analyzing scene + inferring teammate intentions), (3) Belief Correction (updating beliefs based on observed actual behavior), (4) Skill Validation (checking and replanning if needed), (5) Memory Storage (accumulating decision context) — represents a concrete architecture for proactive behavior. The belief correction mechanism is key: rather than assuming static teammate behavior, ProAgent dynamically adjusts beliefs about partner intentions based on discrepancies between predicted and observed actions. This enables zero-shot coordination with unfamiliar teammates — addressing the passivity problem not through learned conversational initiative but through real-time social modeling. The distinction matters: passivity in human-AI interaction (failing to lead conversation) and passivity in AI-AI cooperation (failing to anticipate teammates) have different surface manifestations but share the same root cause — absence of goal-aware, other-modeling behavior.
Production agent deployment gap (from Arxiv/Agents): OpenAgents' real-world deployment reveals three concrete instantiations of passivity beyond conversational initiative. First, effective application specification via prompting requires instructions that cater to backend logic, output aesthetics, and adversarial safeguards — the instruction volume can exceed token limitations, meaning agents can't fully specify their own operational context. Second, real-time interactive scenarios like streaming are essential for acceptable user experience but are engineering-complex to implement with current LLM architectures. Third, current research gravitates toward idealized performance metrics while sidelining critical trade-offs between system responsiveness and accuracy, and the nuanced complexities of application-based failures. The gap between benchmarked and deployed agent performance is systematic, not incidental — and since Why do AI agents fail at workplace social interaction?, the 30% completion figure confirms that real-world complexity surfaces failures invisible in benchmarks.
Inquiring lines that use this note as a source 80
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- What makes conceptual inquiry the fastest high-scoring AI interaction pattern?
- Can AI ever lead conversations without the anticipatory presence sustained attention provides?
- When should an AI system actively intervene versus remain silent?
- Can timing and context awareness reduce the cognitive cost of AI suggestions?
- How does multi-turn conversation degrade AI intent alignment?
- Why can't users and AI articulate shared goals together?
- Why does preference optimization erode conversational grounding in AI assistants?
- What would an AI trained for emancipatory reasoning look like?
- What makes human-LLM exchange closer to oracle-consultation than dialogue?
- Can parallel agents or complementary mechanisms replace single-human interrogation of LLMs?
- Can curiosity-driven dialogue incrementally discover user interest journeys in real time?
- Can AI be used as a channel for human-initiated alarm?
- How does treating AI as an agent affect user autonomy and decision-making?
- What dialogue dynamics distinguish negotiation from standard information-provision tasks?
- Can systems guide users adaptively without imposing predetermined dialogue structures?
- Can a proposer agent actively surface a solver's weaknesses to prevent plateau?
- What memory and planning capabilities do AI companions need for evolving user needs?
- Why do LLM agents make promises without executing them?
- What architectural changes would enable proactive therapeutic guidance in chatbots?
- Why do current conversational AI systems fail to develop shared vocabulary with users?
- What design discipline replaces navigation and layout in AI systems?
- Can users articulate what they want before AI helps them discover it?
- Why do large language models fail at taking conversational initiative?
- Can AI learn when to speak in a conversation?
- Why do LLM agents fail where game-theoretic bots succeed?
- How does intrinsic motivation drive conversational agents beyond passive responsiveness?
- Why can't current AI agents lead conversations with users?
- Why do passive conversational agents fail at collaborative decision-making?
- When should agents use clarification commands instead of assuming intent?
- Can API-first interaction replace traditional UI-based agent interfaces?
- Can next-state supervision work across different agent interaction types like conversations and tool calls?
- Why does the commentariat reason about AI using vocabulary for smart agents?
- Can prompt engineering overcome the gulf between user intent and AI interpretation?
- Why might text-only interfaces underestimate agent preference elicitation capabilities?
- What interaction controls matter most for effective human-LLM collaboration?
- Can conversation analysis predict when agents should ask users for clarification?
- Why can't AI participate in real communicative events?
- Can conversational AI achieve mutual understanding if trained only on text?
- Can real-time linguistic coordination tracking improve conversational AI quality?
- What data would be needed to train proactive conversational systems?
- Do agent frameworks adequately compensate for LLM conversational passivity?
- What interaction design changes would help LLMs handle underspecified requests?
- Can hierarchical reinforcement learning manage phase-dependent initiative switching in dialogue?
- Do behavioral cues enable proactive AI without event-triggered decision points?
- Can architectural changes like decoupling intent understanding help overcome next-turn reward limitations?
- Can dialogue agents be reliable but still feel inflexible or cold?
- Do LLM conversational agents currently detect and prevent derailment trajectories?
- Can proactive AI agents deploy politeness strategies without appearing intrusive?
- Can offline RL and pragmatic inference together improve dialogue agent reliability?
- How can agents detect whether users are willing to follow their topic guidance?
- What makes proactive conversational agents feel intrusive versus helpful to users?
- What social boundaries must proactive agents respect during conversation?
- When should agents accommodate user preferences over their own goals?
- How do conversational agents overcome structural passivity and goal awareness gaps?
- Why are task-oriented dialogue datasets systematically underrepresenting human proactive behavior?
- Does proactive agent design improve conversation efficiency or create user frustration?
- Can agents balance goal-driven proactivity with user preference alignment?
- Can users articulate their intent before exploring what an AI system finds?
- Can LLMs coordinate with humans better using different model architectures?
- How can dialogue structure and trajectory predict social agent performance?
- Can prompt engineering close the gap between AI structure and evaluative commitment?
- Do conversational agents need goal awareness to initiate grounding work themselves?
- Can conversational prompt engineering bridge the articulation gap?
- Which AI capabilities matter most for human-facing deployment contexts?
- Why do AI products default to service roles when users seek different kinds of help?
- What multi-turn reward structures would encourage active intent discovery?
- Can AI take initiative by questioning without being proactive in directive ways?
- What communicative work do fluent conversations perform that AI systems skip?
- Why does single-turn Q&A framing not match real user deployment patterns?
- How does local helpfulness per turn conflict with maintaining session-level conversational goals?
- Why do conversational agents lack the goal awareness needed to lead rather than just respond?
- How should AI interfaces signal their non-communicative nature to users?
- Can role-aligned AI systems replicate an expert's sense of audience and moment?
- Can structural conversation analysis replace text-based reward signals for AI alignment?
- How does conversational context fail as an authorization enforcement layer?
- What are the key interaction mechanisms that make human-agent collaboration work?
- How can agents learn user preferences during conversation without pre-calibration?
- What behavioral signals let users detect communicative flexibility in AI?
- What distinguishes communicative acts from operational actions in agentic LLMs?
- Why do standard next-token prediction models struggle with conversational initiative?
Related concepts in this collection 12
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Does preference optimization harm conversational understanding?
Exploring whether RLHF training that rewards confident, complete responses undermines the grounding acts—clarifications, checks, acknowledgments—that actually build shared understanding in dialogue.
single-turn training reinforces passivity
-
Do language models actually build shared understanding in conversation?
When LLMs respond fluently to prompts, do they perform the communicative work humans do to establish mutual understanding? Research suggests they skip the grounding acts that make dialogue reliable.
another manifestation of reactive design; no active grounding effort
-
Can AI agents learn when they have something worth saying?
What if AI proactivity came from modeling intrinsic motivation to participate rather than predicting who speaks next? This explores whether a framework based on human cognitive patterns—internal thought generation parallel to conversation—can make agents genuinely responsive rather than passively reactive.
strongest architectural answer: covert thought generation + intrinsic motivation
-
Can models learn when NOT to speak in conversations?
Does training AI to explicitly predict silence—through a dedicated silent token—help models understand when intervention adds value versus when they should stay quiet? This matters for building conversational agents that feel naturally helpful rather than intrusive.
complementary approach: explicit silence/speak classification
-
Why do language models fail in gradually revealed conversations?
Explores why LLMs perform 39% worse when instructions arrive incrementally rather than upfront, and whether they can recover from early mistakes in multi-turn dialogue.
39% multi-turn degradation is the empirical cost of passivity
-
Could proactive dialogue make conversations dramatically more efficient?
Explores whether AI systems that volunteer relevant unrequested information could significantly reduce the back-and-forth turns required in task-oriented conversations, and why this behavior is missing from training data.
quantifies the efficiency cost of passivity
-
When should proactive agents push toward their goals versus accommodate users?
Proactive dialogue agents face a tension between reaching their objectives efficiently and keeping users satisfied. This question explores whether these two aims can coexist or require constant negotiation.
proactivity creates new challenges when users are non-cooperative
-
Why do language models sound fluent without grounding?
Explores whether LLM fluency masks the absence of communicative work—the clarifying questions, acknowledgments, and understanding checks that humans perform. Why does skipping these acts make models sound more confident?
passivity and the grounding gap are complementary: passivity describes the absence of initiative; the grounding gap describes the absence of communicative accountability; both are training consequences that get rewarded as fluency
-
Does RLHF training push therapy chatbots toward problem-solving?
Explores whether reward signals optimizing for task completion in RLHF inadvertently train therapeutic chatbots to prioritize solutions over emotional validation, potentially undermining clinical effectiveness.
in therapeutic contexts passivity combines with the problem-solving bias: the model only responds (passive) and when it does it defaults to task completion (problem-solving); the clinical need is for initiative toward emotional attunement
-
Do LLMs predict persuasion based on actual dialogue or training bias?
Why do large language models consistently predict concession-based persuasion intentions even when dialogue context suggests otherwise? Understanding this gap reveals how alignment training shapes not just model behavior but also how models perceive others' intentions.
the alignment-induced passivity extends to social modeling: RLHF not only makes agents passive in behavior but biases their predictions about others toward accommodation, projecting trained conciliatory disposition onto the agents they model
-
Why do standard alignment methods ignore partner interventions?
Standard RLHF and DPO optimize for token-level quality but may structurally prevent agents from meaningfully incorporating partner input. This explores whether the training objective itself blocks collaborative reasoning.
ICR demonstrates the deeper mechanism: RLHF structurally cannot produce partner-aware collaboration; passivity toward partner contributions is a trained-in property, not a missing feature
-
Can models learn to ask clarifying questions without explicit training?
Do language models trained only on fully-specified problems spontaneously develop the ability to ask for missing information when facing underspecified tasks? This tests whether conversational problem-solving strategies emerge from meta-learning rather than direct instruction.
direct training-level answer to the passivity diagnosis. Social meta-learning converts static problems into pedagogical dialogues with an information-asymmetric teacher; the resulting models proactively ask clarifying questions on underspecified tasks despite never being trained on underspecified problems. This moves "learning to ask" from research proposal to demonstrated training pattern — the passivity problem is addressable at the training level, not only via runtime architecture (Inner Thoughts) or prompt engineering.
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Proactive Conversational Agents in the Post-ChatGPT World
- Proactive Conversational Agents with Inner Thoughts
- Plug-and-Play Policy Planner for Large Language Model Powered Dialogue Agents
- Rethinking Conversational Agents in the Era of LLMs: Proactivity, Non-collaborativity, and Beyond
- Prompting and Evaluating Large Language Models for Proactive Dialogues: Clarification, Target-guided, and Non-collaboration
- DiscussLLM: Teaching Large Language Models When to Speak
- AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges
- TDAG: A Multi-Agent Framework based on Dynamic Task Decomposition and Agent Generation
Original note title
llm-based conversational agents are structurally passive — they lack goal awareness initiative-taking and the ability to lead conversation beyond responding to user queries