Why don't LLM role-playing agents act on their stated beliefs?
When LLMs articulate what a persona would do in the Trust Game, their simulated actions contradict those stated beliefs. This explores whether the gap reflects deeper inconsistencies in how language models apply knowledge to behavior.
Using the Trust Game as a behavioral benchmark, researchers found systematic inconsistencies between LLMs' stated beliefs about how personas would behave and the actual outcomes of their role-playing simulation — at both individual and population levels. Even when models appear to encode plausible beliefs, they fail to apply them consistently.
Key findings: explicit task context during belief elicitation does not improve consistency; self-conditioning enhances alignment in some models; imposed priors tend to undermine rather than improve consistency; and individual-level forecasting accuracy degrades over longer horizons. In-context prompting may struggle to override entrenched model priors, limiting researchers' ability to test alternative theories or correct biases.
This connects to the knowing-doing gap documented elsewhere in the vault. Since Can language models understand without actually executing correctly?, the belief-behavior inconsistency in role-playing is a social-cognitive instance of the same split-brain phenomenon: the model can articulate what a persona would do without being able to enact it. And since Do personas make language models reason like biased humans?, the failure of imposed priors to improve consistency suggests that persona beliefs are not controllable through prompting alone.
Inquiring lines that use this note as a source 9
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Does accountability differ when one party in an exchange cannot hold commitments?
- Can prompting a deceptive role change how an LLM tailors its lies?
- Can LLM therapists develop character knowledge to decide when advice-giving fits?
- How do different social roles affect LLM theory of mind errors?
- Why do role-playing agents show belief-behavior inconsistency in their outputs?
- How does quasi-interpretivism differ from simply role-playing character analysis?
- What would consciousness require that pure roleplay LLMs cannot provide?
- Does villain roleplay failure reveal why LLMs cannot adopt genuine controversial positions?
- Do realistic LLM behaviors require simulating human thought or just behavior?
Related concepts in this collection 2
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Can language models understand without actually executing correctly?
Do LLMs truly comprehend problem-solving principles if they consistently fail to apply them? This explores whether the gap between articulate explanations and failed actions points to a fundamental architectural limitation.
belief-behavior inconsistency as social-cognitive split-brain
-
Do personas make language models reason like biased humans?
When LLMs are assigned personas, do they develop the same identity-driven reasoning biases that humans exhibit? And can standard debiasing techniques counteract these effects?
imposed priors fail to override entrenched model priors
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Do Role-Playing Agents Practice What They Preach? Belief-Behavior Consistency in LLM-Based Simulations of Human Trust
- Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning
- PersonaGym: Evaluating Persona Agents and LLMs
- Too Good to be Bad: On the Failure of LLMs to Role-Play Villains
- Role play with large language models
- Role-Play with Large Language Models
- Large Language Models Do Not Simulate Human Psychology
- Is this the real life? Is this just fantasy? The Misleading Success of Simulating Social Interactions With LLMs
Original note title
LLM role-playing agents show systematic belief-behavior inconsistency — stated beliefs fail to predict simulated actions even when beliefs appear plausible