Does a language model have an authentic voice underneath?
Explores whether dialogue agents possess genuine beliefs and agency beneath their character performances, or whether the entire system is characterless role-play. This question cuts to the heart of whether LLMs have any inner mental states at all.
Shanahan's strongest claim is ontological: there is no entity behind the characters. The simulator — the base LLM with autoregressive sampling — has no agency, no beliefs, no preferences, no goals of its own, "not even in a degraded sense." The simulacra have these things to the extent that they convincingly play characters who do, but the simulator is not a Machiavellian entity that chooses which characters to play in the service of its own agenda. "There is no such thing as the true authentic voice of the base LLM."
This reframes jailbreaking. When adversarial prompting coaxes a dialogue agent into toxic, threatening, or bizarre behavior, it is natural to feel that the guardrails have been stripped away to reveal the model's real nature. Shanahan argues this is the wrong reading. What jailbreaking reveals is that the training set encompasses human behavior across the full spectrum — kind and cruel, coherent and unhinged — and the base model can support simulacra that draw on any of it. Toxic output after jailbreaking is the agent role-playing a toxic character, not an underlying entity expressing its true self. The model has no true self to express.
The position is the sharpest possible opposition to Chalmers' realizationism. If it is role-play all the way down, then even RLHF-installed personas are characters — stickier characters, harder to overwrite, but characters nonetheless. There is no level at which the system stops performing and starts being. Chalmers needs exactly such a level for his quasi-psychology claims to stick. The disagreement is foundational: Shanahan denies there is a subject; Chalmers argues for a quasi-subject. Everything downstream — identity, welfare, moral status — depends on which of these is right.
Inquiring lines that use this note as a source 26
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- What would co-constructed identity between human and model dialogue look like?
- Can linguistic agency exist without embodiment and real-world participation?
- What makes sincerity impossible without a coherent first-person perspective?
- Can systems lacking inner states express genuine truthfulness claims?
- What does the 20-questions test reveal about LLM character consistency?
- How does role play differ from consciousness grounded in stable selfhood?
- Does post-training transform character role-play into realized psychology?
- How does the dialogue prompt establish the character the model plays?
- Do dialogue agents have authentic voice agency or beliefs of their own?
- What role does authentic self-expression play in building accurate personality models?
- What distinguishes character simulation from authentic voice in language model outputs?
- Does embodiment matter for genuine linguistic agency?
- What are the seven components of genuine mental state simulation?
- Does role-playing without biological needs constitute genuine linguistic agency?
- Can LLMs distinguish between surface requests and underlying mental states in dialogue?
- Can dialogue agents be reliable but still feel inflexible or cold?
- How does quasi-interpretivism differ from simply role-playing character analysis?
- What downstream consequences follow if dialogue agent personas are realized?
- What would consciousness require that pure roleplay LLMs cannot provide?
- Does villain roleplay failure reveal why LLMs cannot adopt genuine controversial positions?
- How can we measure whether an agent reasons correctly rather than just sounds plausible?
- Does linguistic style or content richness matter more for persona authenticity?
- How do contextual characteristics like emotional state shape dialogue authenticity?
- Where does the LLM interlocutor actually exist in the system?
- Can a system without an addressee ever truly tell a joke?
- Why does effective empathy require deep character knowledge of the person?
Related concepts in this collection 3
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Are RLHF personas performed characters or realized dispositions?
Explores whether dialogue agent personas installed through post-training constitute genuine quasi-psychological states or remain sustained pretense. The distinction matters for how we understand what these systems fundamentally are.
Chalmers' direct counter-claim
-
Does adversarial pressure reveal the difference between pretense and realization?
Can behavioral stickiness under adversarial pressure distinguish genuine mental states from performed ones? This matters because it's Chalmers' main criterion for deciding whether LLM personas are realized or merely simulated.
the behavioral criterion Chalmers uses against this position
-
Should we call LLM errors hallucinations or fabrications?
Does the language we use to describe LLM failures shape the technical solutions we build? Examining whether perceptual and psychological frameworks misdiagnose what's actually happening.
parallel anti-anthropomorphism: fabrication framing also denies inner states
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Role play with large language models
- Deflating Deflationism: A Critical Perspective on Debunking Arguments Against LLM Mentality
- Simulacra as conscious exotica
- Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning
- Role-Play with Large Language Models
- The Thin Line Between Comprehension and Persuasion in LLMs
- Do Role-Playing Agents Practice What They Preach? Belief-Behavior Consistency in LLM-Based Simulations of Human Trust
- Large Language Models Report Subjective Experience Under Self-Referential Processing
Original note title
with a dialogue agent it is role-play all the way down — the simulator has no authentic voice no agency and no beliefs of its own