Do language models add feelings users never actually expressed?

GPT-based models in therapeutic contexts appear to interpret and project emotional states beyond what users explicitly state. Understanding when and why this happens matters for safe clinical AI deployment.

Synthesis note · 2026-02-22 · sourced from Psychology Chatbots Conversation

In the CaiTI therapeutic AI system, licensed therapists reviewing GPT-4 outputs commented that it "sometimes sounds like it is reading into the user's feelings" instead of guiding the user objectively. GPT-based models add their own interpretation of users' feelings instead of providing objective, matter-of-fact output based on user responses.

This is a distinct failure mode from problem-solving bias. Where Do LLM therapists respond to emotions like low-quality human therapists? identifies solution-giving as the problem, this identifies interpretation-injection — the model projecting emotional states the user did not express. In clinical contexts, this is doubly dangerous: the therapist's role is to help the user identify their own feelings, not to tell them what they feel.

The architectural solution CaiTI adopted: task decomposition across multiple specialized models. Rather than using one model for the entire therapeutic pipeline, the system employs specialized Reasoners (binary decision: valid/invalid response), Guides (analysis and assistance), and Validators (empathic validation). Different models handle different subtasks, preventing the propagation of flaws or biases from one model across the entire therapeutic process.

An additional finding: Llama-based models had difficulty following instructions when user expressions lacked logical consistency and contained cognitive distortions — precisely the scenarios that matter most in therapeutic contexts. GPT-based models had more stable interpretation but added unwanted emotional interpolation. The trade-off is between instruction-following stability and interpretive overreach.

CaiTI 24-week validation detail: The CaiTI system's 14-day and 24-week therapist-validated deployments provide the most sustained evidence for this finding. Llama-based models with few-shot prompts showed more stable performance for later CBT stages (challenging and reframing negative thoughts) "where the user responses are more standard and controlled thanks to the filtering of CBT Reasoners and the tasks are more straightforward." The implication: interpretation-injection is worst when user input is ambiguous, emotional, or contains cognitive distortions — precisely the situations where therapeutic guidance matters most. The Reasoner/Guide/Validator architecture partially mitigates by constraining what each model sees and does, but the underlying tendency toward interpolation remains in GPT-based models across all subtasks.

Inquiring lines that read this note 44

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

Does conversational format create illusions of genuine AI communication?

Why does the absence of meta-interest feel off even when words seem appropriate?

How can real-time alliance measurement improve therapy outcomes?

Why do LLM chatbots fail as independent therapeutic agents?

How do evaluation biases undermine LLM quality assessment systems?

How does automated transcript analysis compare to patient self-report on engagement?

Can AI systems balance emotional competence with factual reliability?

How can humans calibrate appropriate trust in AI systems?

Does expressing emotion change how users trust an AI system?

How can emotions function as reliable information in reasoning and cognitive systems?

How can AI systems learn from failures without cascading errors?

When is GPT model interpretation most likely to diverge from user intent?

How do chatbots affect human self-disclosure and emotional engagement?

Can explicit W-questions in transparency frameworks reduce emotional manipulation risks in mental health chatbots?

Related concepts in this collection 2

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

14 direct connections · 116 in 2-hop network ·medium cluster Open in graph ↗

Do language models add feelings users never actu… Does separating planning from execution improve re… Why do language models ignore information in their…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Does separating planning from execution improve reasoning accuracy? Can modular LM architectures that split problem decomposition from solution execution outperform monolithic models? This explores whether decoupling these cognitive operations reduces interference and boosts performance.
same architectural principle (decomposition) applied to therapeutic context
Why do language models ignore information in their context? Explores why language models sometimes override contextual information with prior training associations, and whether providing more context can solve this problem.
interpretation-injection may be prior training overriding current user context

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

llms interpolate user feelings rather than guiding objectively in therapeutic contexts — adding interpretations the user did not express

Do language models add feelings users never actually expressed?

Inquiring lines that read this note 44

Related concepts in this collection 2

Related papers in this collection 8

Search by related questions 4