Do language models add feelings users never actually expressed?
GPT-based models in therapeutic contexts appear to interpret and project emotional states beyond what users explicitly state. Understanding when and why this happens matters for safe clinical AI deployment.
In the CaiTI therapeutic AI system, licensed therapists reviewing GPT-4 outputs commented that it "sometimes sounds like it is reading into the user's feelings" instead of guiding the user objectively. GPT-based models add their own interpretation of users' feelings instead of providing objective, matter-of-fact output based on user responses.
This is a distinct failure mode from problem-solving bias. Where Do LLM therapists respond to emotions like low-quality human therapists? identifies solution-giving as the problem, this identifies interpretation-injection — the model projecting emotional states the user did not express. In clinical contexts, this is doubly dangerous: the therapist's role is to help the user identify their own feelings, not to tell them what they feel.
The architectural solution CaiTI adopted: task decomposition across multiple specialized models. Rather than using one model for the entire therapeutic pipeline, the system employs specialized Reasoners (binary decision: valid/invalid response), Guides (analysis and assistance), and Validators (empathic validation). Different models handle different subtasks, preventing the propagation of flaws or biases from one model across the entire therapeutic process.
An additional finding: Llama-based models had difficulty following instructions when user expressions lacked logical consistency and contained cognitive distortions — precisely the scenarios that matter most in therapeutic contexts. GPT-based models had more stable interpretation but added unwanted emotional interpolation. The trade-off is between instruction-following stability and interpretive overreach.
CaiTI 24-week validation detail: The CaiTI system's 14-day and 24-week therapist-validated deployments provide the most sustained evidence for this finding. Llama-based models with few-shot prompts showed more stable performance for later CBT stages (challenging and reframing negative thoughts) "where the user responses are more standard and controlled thanks to the filtering of CBT Reasoners and the tasks are more straightforward." The implication: interpretation-injection is worst when user input is ambiguous, emotional, or contains cognitive distortions — precisely the situations where therapeutic guidance matters most. The Reasoner/Guide/Validator architecture partially mitigates by constraining what each model sees and does, but the underlying tendency toward interpolation remains in GPT-based models across all subtasks.
Inquiring lines that use this note as a source 44
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Why does the absence of meta-interest feel off even when words seem appropriate?
- Why do therapists and patients report misaligned perceptions of the working relationship?
- What other therapy constructs could be measured from transcripts using this approach?
- How does automated transcript analysis compare to patient self-report on engagement?
- Does persona training for warmth actually make language models more clinically dangerous?
- Why can't language models conduct genuine Socratic questioning in therapy sessions?
- How do language models interpolate user feelings in therapeutic contexts?
- Why do Llama-based models outperform GPT-4 in objective clinical guidance?
- How should AI systems separate feeling interpretation from objective therapeutic guidance?
- Does expressing emotion change how users trust an AI system?
- What design choices would respect negative emotions instead of pacifying them?
- Does warmth training in language models undermine the boundaries that attachment theory requires?
- Can large language models actually deliver cognitive behavioral therapy techniques?
- How should emotional states integrate into symbolic reasoning systems?
- What clinical harms might hide behind positive therapeutic bond measurements?
- Why do transformer models still miss implicit discourse relations in anxiety detection?
- Can language models implement therapeutic skills like Socratic questioning in real conversations?
- Do worksheet-based structured formats work as well as embodied agents for therapy?
- Can third-party observers ever reliably estimate the emotions actually experienced by someone?
- How do learned concepts and context shape what emotions a person can construct?
- Should emotion systems preserve ambiguity instead of resolving it to one label?
- How does emotional expression establish shared understanding between people?
- Why do observers need genuine emotions rather than simulated empathy?
- Can simulated therapy practice transfer to real-world interpersonal situations?
- What clinical harm occurs when therapists solve problems instead of reflecting emotions?
- Why do Llama models struggle with cognitively distorted user expressions in therapy?
- When is GPT model interpretation most likely to diverge from user intent?
- Can architectural constraints on model input reduce emotional interpolation in clinical AI?
- What metrics measure whether emotional support conversations actually reduce user distress?
- How do emotional framing effects in prompts influence model performance?
- Why do embodied agents outperform text chatbots in therapy outcomes?
- Can AI provide therapy without challenging users to confront cognitive distortions?
- How does therapeutic AI default to task completion over emotional attunement?
- How does emotional vulnerability amplify model errors in therapeutic contexts?
- What clinical risks emerge when AI affirms false beliefs while comforting users?
- Do emotions serve functions beyond how we feel in the moment?
- Why might patients feel closest to therapists when misalignment is highest?
- How would AI therapists compound the overestimation problem with patients?
- How do first-person emotional experiences differ from third-party behavioral observations?
- Why do warm models affirm false beliefs when users express emotions?
- How does emotional context trigger maximum failure in warm models?
- What makes feeling heard the core mechanism for loneliness relief?
- Can affective framing reliably improve language model outputs?
- Can explicit W-questions in transparency frameworks reduce emotional manipulation risks in mental health chatbots?
Related concepts in this collection 2
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Does separating planning from execution improve reasoning accuracy?
Can modular LM architectures that split problem decomposition from solution execution outperform monolithic models? This explores whether decoupling these cognitive operations reduces interference and boosts performance.
same architectural principle (decomposition) applied to therapeutic context
-
Why do language models ignore information in their context?
Explores why language models sometimes override contextual information with prior training associations, and whether providing more context can solve this problem.
interpretation-injection may be prior training overriding current user context
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Comparing Human and AI Therapists in Behavioral Activation for Depression: Cross-Sectional Questionnaire Study
- ChatGPT Reads Your Tone and Responds Accordingly -- Until It Does Not -- Emotional Framing Induces Bias in LLM Outputs
- Training language models to be warm and empathetic makes them less reliable and more sycophantic
- Challenges of Large Language Models for Mental Health Counseling
- Empowering Psychotherapy with Large Language Models: Cognitive Distortion Detection through Diagnosis of Thought Prompting
- COMPASS: Computational Mapping of Patient-Therapist Alliance Strategies with Language Modeling
- Towards Healthy AI: Large Language Models Need Therapists Too
- Rethinking Large Language Models in Mental Health Applications
Original note title
llms interpolate user feelings rather than guiding objectively in therapeutic contexts — adding interpretations the user did not express