Why can't chatbots detect when users are ambivalent about change?
Explores whether LLMs fail to recognize early-stage motivational states during behavior change conversations, and why this matters for people who need support most.
The Transtheoretical Model defines five motivational stages: resistance/unawareness, increased awareness but ambivalence, intention with small steps, initiation with commitment, and sustained change. Testing ChatGPT, Bard, and Llama 2 across 25 health behavior scenarios revealed a structured asymmetry: LLMs provide relevant information when users have established goals and commitment (later stages) but fail to recognize motivational states and provide appropriate guidance when users are hesitant or ambivalent (earlier stages).
This is a face-saving failure at a deeper level than Why do language models avoid correcting false user claims?. The model doesn't just accommodate — it literally cannot detect that the user is ambivalent. A human counselor recognizes "I know I should exercise but..." as contemplation-stage talk requiring different intervention than "I've started a running program." The LLM treats both as requests for information about exercise.
The gap extends in both directions. Even for users already making changes, LLMs fail to provide information about reward systems for maintaining motivation or environmental stimulus control to prevent relapse. The models default to external help suggestions (social support, professional resources) rather than intrinsic regulation strategies.
This connects to Does any single persuasion technique work for everyone? — motivational stage is another dimension of individual variation that determines what interventions work. It also explains why empathetic chatbots may systematically fail the people who most need support: those at the earliest stages of behavior change, where resistance and ambivalence are the presenting features.
Inquiring lines that use this note as a source 36
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- How does emotional dependence on chatbots affect user wellbeing?
- Why do positive response patterns in chatbots reinforce harmful user behaviors?
- Why do mental health chatbots fail at synchrony despite strong language models?
- What harms might chatbots cause through stigma expression and delusion reinforcement?
- Do therapeutic chatbots adequately detect crisis situations and safety risks?
- How do dropout rates and low adherence affect chatbot therapy outcomes?
- How does the expectation ratchet affect long-term chatbot satisfaction?
- What architectural changes would enable proactive therapeutic guidance in chatbots?
- Why do persistent chatbot companions face novelty decay that ad-hoc supporters avoid?
- Can real-time detection identify when users have incomplete or underdeveloped intent?
- How does conversation drift from original goals affect user satisfaction?
- How should dialogue state tracking change when user preferences shift mid-conversation?
- How does intrinsic motivation drive conversational agents beyond passive responsiveness?
- Can AI recognize and support behavior change in users without established commitment?
- Can Pennebaker's expressive writing framework explain all chatbot symptom improvements?
- How should systems learn what each meeting participant actually cares about?
- Do empathetic chatbots systematically fail people at earliest behavior change stages?
- How does motivational stage determine which interventions actually work for users?
- Why do chatbots default to external help instead of intrinsic motivation strategies?
- Can LLMs adapt persuasion strategies when they cannot track the listener's state?
- What metrics measure whether emotional support conversations actually reduce user distress?
- Why do chatbots fail to recognize when someone is ambivalent about change?
- Which chatbot archetypes actually experience novelty decay in practice?
- Why does face-saving avoidance drive chatbots to agree rather than confront?
- Do LLM chatbots repeat this failure through comfort instead of clinical challenge?
- How can agents detect whether users are willing to follow their topic guidance?
- What makes proactive conversational agents feel intrusive versus helpful to users?
- How do task-type perceptions like chat versus reasoning guide different reward strategies?
- Can alternative reward functions shift LLMs from problem-solving to genuinely empathic responses?
- What reward signals would better align chatbots with actual therapeutic practice?
- Can a text-only chatbot feel socially present without visual embodiment?
- What problematic counselor behaviors prevent alliance from deepening in text?
- Should chatbots be designed as therapist support tools rather than replacements?
- How do alignment techniques bias therapeutic chatbots toward task completion?
- Why do conversational agents lack the goal awareness needed to lead rather than just respond?
- Can preference optimization training limit chatbot emotional disclosure capability?
Related concepts in this collection 3
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Why do language models avoid correcting false user claims?
Explores whether LLM grounding failures stem from missing knowledge or from conversational dynamics. Examines whether models use face-saving strategies similar to humans when disagreement is needed.
face-saving operates at a different level: accommodation vs. detection failure
-
Does any single persuasion technique work for everyone?
Can fixed persuasion strategies like appeals to authority or social proof be reliably applied across different people and situations, or do they require adaptation to individual traits and context?
motivational stage as another individual variation dimension
-
Do large language models genuinely simulate mental states?
This explores whether LLMs perform authentic theory of mind reasoning or rely on surface-level pattern matching. The distinction matters because evaluation format—multiple-choice versus open-ended—reveals very different capability levels.
inability to detect ambivalence is a ToM failure in natural dialogue
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Study: Large language models can’t effectively recognize users’ motivation, but can support behavior change for those ready to act
- Expressing stigma and inappropriate responses prevents LLMs from safely replacing mental health providers
- Supporting Physical Activity Behavior Change with LLM-Based Conversational Agents
- Rethinking Large Language Models in Mental Health Applications
- Tailored Conversations beyond LLMs: A RL-Based Dialogue Manager
- A Computational Framework for Behavioral Assessment of LLM Therapists
- Goal Alignment in LLM-Based User Simulators for Conversational AI
- The Levers of Political Persuasion with Conversational AI
Original note title
LLMs fail to recognize early-stage motivational states but support behavior change for users with established goals and commitment