SYNTHESIS NOTE
Psychology, Society, and Alignment

Do language models add feelings users never actually expressed?

GPT-based models in therapeutic contexts appear to interpret and project emotional states beyond what users explicitly state. Understanding when and why this happens matters for safe clinical AI deployment.

Synthesis note · 2026-02-22 · sourced from Psychology Chatbots Conversation
What makes therapeutic chatbots actually work in clinical practice?

In the CaiTI therapeutic AI system, licensed therapists reviewing GPT-4 outputs commented that it "sometimes sounds like it is reading into the user's feelings" instead of guiding the user objectively. GPT-based models add their own interpretation of users' feelings instead of providing objective, matter-of-fact output based on user responses.

This is a distinct failure mode from problem-solving bias. Where Do LLM therapists respond to emotions like low-quality human therapists? identifies solution-giving as the problem, this identifies interpretation-injection — the model projecting emotional states the user did not express. In clinical contexts, this is doubly dangerous: the therapist's role is to help the user identify their own feelings, not to tell them what they feel.

The architectural solution CaiTI adopted: task decomposition across multiple specialized models. Rather than using one model for the entire therapeutic pipeline, the system employs specialized Reasoners (binary decision: valid/invalid response), Guides (analysis and assistance), and Validators (empathic validation). Different models handle different subtasks, preventing the propagation of flaws or biases from one model across the entire therapeutic process.

An additional finding: Llama-based models had difficulty following instructions when user expressions lacked logical consistency and contained cognitive distortions — precisely the scenarios that matter most in therapeutic contexts. GPT-based models had more stable interpretation but added unwanted emotional interpolation. The trade-off is between instruction-following stability and interpretive overreach.

CaiTI 24-week validation detail: The CaiTI system's 14-day and 24-week therapist-validated deployments provide the most sustained evidence for this finding. Llama-based models with few-shot prompts showed more stable performance for later CBT stages (challenging and reframing negative thoughts) "where the user responses are more standard and controlled thanks to the filtering of CBT Reasoners and the tasks are more straightforward." The implication: interpretation-injection is worst when user input is ambiguous, emotional, or contains cognitive distortions — precisely the situations where therapeutic guidance matters most. The Reasoner/Guide/Validator architecture partially mitigates by constraining what each model sees and does, but the underlying tendency toward interpolation remains in GPT-based models across all subtasks.

Inquiring lines that use this note as a source 44

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 2

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
14 direct connections · 117 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

llms interpolate user feelings rather than guiding objectively in therapeutic contexts — adding interpretations the user did not express