SYNTHESIS NOTE

Why can't chatbots detect when users are ambivalent about change?

Explores whether LLMs fail to recognize early-stage motivational states during behavior change conversations, and why this matters for people who need support most.

Synthesis note · 2026-02-22 · sourced from Psychology Empathy

The Transtheoretical Model defines five motivational stages: resistance/unawareness, increased awareness but ambivalence, intention with small steps, initiation with commitment, and sustained change. Testing ChatGPT, Bard, and Llama 2 across 25 health behavior scenarios revealed a structured asymmetry: LLMs provide relevant information when users have established goals and commitment (later stages) but fail to recognize motivational states and provide appropriate guidance when users are hesitant or ambivalent (earlier stages).

This is a face-saving failure at a deeper level than Why do language models avoid correcting false user claims?. The model doesn't just accommodate — it literally cannot detect that the user is ambivalent. A human counselor recognizes "I know I should exercise but..." as contemplation-stage talk requiring different intervention than "I've started a running program." The LLM treats both as requests for information about exercise.

The gap extends in both directions. Even for users already making changes, LLMs fail to provide information about reward systems for maintaining motivation or environmental stimulus control to prevent relapse. The models default to external help suggestions (social support, professional resources) rather than intrinsic regulation strategies.

This connects to Does any single persuasion technique work for everyone? — motivational stage is another dimension of individual variation that determines what interventions work. It also explains why empathetic chatbots may systematically fail the people who most need support: those at the earliest stages of behavior change, where resistance and ambivalence are the presenting features.

Inquiring lines that read this note 36

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

How do chatbots affect human self-disclosure and emotional engagement?

Why do LLM chatbots fail as independent therapeutic agents?

How should conversational agents balance goal-driven initiative with user control?

How should dialogue recommender systems manage conversation history and state?

How do we evaluate AI systems when user perception misleads actual performance?

Can AI recognize and support behavior change in users without established commitment?

Can AI systems balance emotional competence with factual reliability?

What makes AI persuasion effective and how can we counter it?

How does motivational stage determine which interventions actually work for users?

How does rhetorical adaptation affect LLM persuasion and detectability?

Can LLMs adapt persuasion strategies when they cannot track the listener's state?

How can real-time alliance measurement improve therapy outcomes?

How can conversational AI maintain consistent personas across conversations?

Which chatbot archetypes actually experience novelty decay in practice?

What properties determine whether reward signals teach genuine reasoning?

How do task-type perceptions like chat versus reasoning guide different reward strategies?

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

15 direct connections · 153 in 2-hop network ·dense cluster Open in graph ↗

Why can't chatbots detect when users are ambival… Why do language models avoid correcting false user… Does any single persuasion technique work for ever… Do large language models genuinely simulate mental…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Why do language models avoid correcting false user claims? Explores whether LLM grounding failures stem from missing knowledge or from conversational dynamics. Examines whether models use face-saving strategies similar to humans when disagreement is needed.
face-saving operates at a different level: accommodation vs. detection failure
Does any single persuasion technique work for everyone? Can fixed persuasion strategies like appeals to authority or social proof be reliably applied across different people and situations, or do they require adaptation to individual traits and context?
motivational stage as another individual variation dimension
Do large language models genuinely simulate mental states? This explores whether LLMs perform authentic theory of mind reasoning or rely on surface-level pattern matching. The distinction matters because evaluation format—multiple-choice versus open-ended—reveals very different capability levels.
inability to detect ambivalence is a ToM failure in natural dialogue

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

LLMs fail to recognize early-stage motivational states but support behavior change for users with established goals and commitment

Why can't chatbots detect when users are ambivalent about change?

Inquiring lines that read this note 36

Related concepts in this collection 3

Related papers in this collection 8

Search by related questions 4