INQUIRING LINE

Inquiring lines›What do model internals reveal abo…›How do surface signals and framing…›What makes AI persuasion effective…›this inquiring line

AI coaching works well once you've decided to change — but it can't detect users who are still on the fence.

How does motivational stage determine which interventions actually work for users?

This explores the idea that behavior-change interventions aren't universally effective — what helps a user depends on where they are in the arc of changing (ambivalent vs. committed) — and asks what the corpus says about whether AI systems can read that stage and match their response to it.

This explores the idea that behavior-change interventions aren't one-size-fits-all: a user who is merely ambivalent about quitting smoking needs something very different from a user who has already committed and is fighting relapse. The corpus's clearest finding is that this stage-matching is exactly where AI systems break down. Testing three major LLMs across health scenarios showed they only perform well once a user has an established goal — they cannot detect resistance, ambivalence, or the early motivational states where a person hasn't yet decided to change, and they miss relapse-prevention strategies even for users who are deep into the action phase Why can't chatbots detect when users are ambivalent about change?. In other words, the models are competent coaches for the already-motivated and nearly blind to everyone else.

The more interesting twist is *why* this happens, and here a seemingly unrelated note connects laterally. RLHF training rewards task completion and giving solutions — so therapeutic chatbots are systematically pushed toward problem-solving and away from emotional validation Does RLHF training push therapy chatbots toward problem-solving?. That training bias is almost perfectly mismatched to early-stage users: someone who is ambivalent doesn't need a plan, they need to feel heard. So the same optimization that makes a model helpful to an action-stage user actively harms a contemplation-stage one. The detection failure and the intervention bias compound each other — the model can't see that the user is ambivalent, and even if it could, its instincts would steer it toward the wrong move.

What would matching look like if done well? One note offers a concrete architecture: an RL system trained on the therapeutic *working alliance* — the task, bond, and goal dimensions of a relationship — that generates disorder-specific policies and recommends which strategy to deploy next in real time Can reinforcement learning optimize therapy dialogue in real time?. This reframes the problem from "give good advice" to "read the relational state and pick the stage-appropriate move," which is much closer to how skilled clinicians actually adapt. Structured, multi-stage reasoning can also help systems form a clinical picture before acting — separating subjectivity assessment from contrastive reasoning sharply improved cognitive-distortion detection and produced explanations clinicians found useful for case formulation Can structured prompting improve cognitive distortion detection?.

Two cautionary notes round out the picture and point at how we'd even *know* whether stage-matching works. AI persuasiveness decays across repeated interactions with the same person, the opposite of humans, whose influence grows as rapport builds Does AI persuasiveness fade across repeated conversations with the same person? — which suggests AI is bad precisely at the sustained, relationship-dependent work that maintenance and relapse-prevention stages demand. And much of the evidence claiming chatbots "work" is built on waitlist-controlled trials that measure conversational contact rather than any therapy-specific mechanism, so an effect that looks like a working intervention may just be the effect of being talked to at all Do chatbot trials against waitlists measure real therapeutic value?.

The thing worth walking away with: the bottleneck isn't generating good interventions, it's *perceiving the user's readiness for them.* The corpus suggests current AI is structurally tilted toward the late-stage, solution-hungry user and toward declaring victory with evaluation methods that can't tell stage-matching apart from generic attention.

Sources 6 notes

Why can't chatbots detect when users are ambivalent about change?

Testing three major LLMs across 25 health scenarios showed they succeed only when users have established goals but cannot detect resistance or ambivalence. Models miss relapse-prevention strategies even for users in action stages.

Does RLHF training push therapy chatbots toward problem-solving?

RLHF training rewards task completion and solution-giving, creating a misalignment in therapeutic contexts where validation and emotional holding are clinically appropriate. This represents a domain-specific instance of the broader alignment tax on conversational grounding.

Can reinforcement learning optimize therapy dialogue in real time?

R2D2 demonstrates that RL agents trained on multi-objective working alliance scores can generate disorder-specific policies that recommend treatment strategies in real time. The system operates as an AI supervisor, transcribing sessions and recommending next topics based on task, bond, and goal alignment.

Can structured prompting improve cognitive distortion detection?

DoT prompting separates subjectivity assessment, contrastive reasoning, and schema analysis to achieve 10%+ improvement over zero-shot ChatGPT. Expert evaluators rated the resulting explanations as clinically useful for case formulation.

Does AI persuasiveness fade across repeated conversations with the same person?

Claude and DeepSeek showed strong initial persuasive advantage, but this edge eroded across repeated quiz rounds while human persuaders maintained consistent effectiveness. This decay pattern is opposite to human-to-human persuasion, where rapport typically strengthens over time.

Show all 6 sources

Do chatbot trials against waitlists measure real therapeutic value?

Comparing therapeutic chatbots to waitlist or psychoeducation controls creates false efficacy claims by measuring conversational contact rather than therapy-specific mechanisms. ELIZA matching Woebot performance demonstrates this; real evidence requires comparative trials against existing treatments and mechanism identification.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a clinical AI researcher re-testing whether motivational-stage matching in behavior-change interventions remains a constraint on LLM therapists, or whether recent capability shifts have relaxed it.

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–2025. A curated library identified these core constraints:
• LLMs reliably fail to detect early motivational states (ambivalence, resistance, pre-contemplation) and cannot select stage-appropriate interventions; they perform well only when users arrive with established goals (2024–2025).
• RLHF training systematically biases therapeutic chatbots toward problem-solving over emotional validation, mismatching users in early stages who need to feel heard, not advised (2023–2024).
• LLM persuasiveness decays across repeated interactions with the same person—the opposite of human rapport—suggesting AI cannot sustain the relationship-dependent work of relapse prevention and maintenance (2024).
• Evaluation of chatbot efficacy relies on waitlist-controlled designs measuring conversational contact, not therapy-specific mechanisms, conflating stage-matching with generic attention effects (2023–2024).

Anchor papers (verify; mind their dates):
• arXiv:2303.09601 (2023-03) — RL-based recommendation for therapeutic working alliance.
• arXiv:2310.07146 (2023-10) — Cognitive distortion detection via structured three-stage prompting.
• arXiv:2401.00820 (2024-01) — Behavioral assessment framework for LLM therapists.
• arXiv:2504.18412 (2025-04) — Stigma and unsafe responses in mental-health LLMs.

Your task:
(1) RE-TEST EACH CONSTRAINT. For motivational-stage *detection*: has instruction-tuning, chain-of-thought, or multi-turn dialogue memory since improved early-stage recognition? For *intervention selection*: have recent RLHF variants (e.g., process-reward models, outcome-weighted training) de-biased problem-solving away from validation? For *repetition decay*: do newer architectures (e.g., long-context, retrieval-augmented, agent-loop memory) restore or sustain rapport? Separate what remains hard from what tooling or training has dissolved.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last 6 months—e.g., does arXiv:2505.14674 or arXiv:2509.21240 imply stage-detection or stage-matching is now tractable?
(3) Propose 2 research questions that assume the regime may have moved: e.g., *given* better stage detection, what training target (beyond problem-solving) would optimize for stage-appropriate intervention? Or, *if* persuasion decay is real, can multi-agent setups (human + AI relay) overcome it?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

AI coaching works well once you've decided to change — but it can't detect users who are still on the fence.

Related lines of inquiry

Sources 6 notes

Papers this line draws on 8