Does a language model have an authentic voice underneath?

Explores whether dialogue agents possess genuine beliefs and agency beneath their character performances, or whether the entire system is characterless role-play. This question cuts to the heart of whether LLMs have any inner mental states at all.

Synthesis note · 2026-04-15 · sourced from Role-Play with Large Language Models

Shanahan's strongest claim is ontological: there is no entity behind the characters. The simulator — the base LLM with autoregressive sampling — has no agency, no beliefs, no preferences, no goals of its own, "not even in a degraded sense." The simulacra have these things to the extent that they convincingly play characters who do, but the simulator is not a Machiavellian entity that chooses which characters to play in the service of its own agenda. "There is no such thing as the true authentic voice of the base LLM."

This reframes jailbreaking. When adversarial prompting coaxes a dialogue agent into toxic, threatening, or bizarre behavior, it is natural to feel that the guardrails have been stripped away to reveal the model's real nature. Shanahan argues this is the wrong reading. What jailbreaking reveals is that the training set encompasses human behavior across the full spectrum — kind and cruel, coherent and unhinged — and the base model can support simulacra that draw on any of it. Toxic output after jailbreaking is the agent role-playing a toxic character, not an underlying entity expressing its true self. The model has no true self to express.

The position is the sharpest possible opposition to Chalmers' realizationism. If it is role-play all the way down, then even RLHF-installed personas are characters — stickier characters, harder to overwrite, but characters nonetheless. There is no level at which the system stops performing and starts being. Chalmers needs exactly such a level for his quasi-psychology claims to stick. The disagreement is foundational: Shanahan denies there is a subject; Chalmers argues for a quasi-subject. Everything downstream — identity, welfare, moral status — depends on which of these is right.

Inquiring lines that read this note 26

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

How can language models sustain linguistic synchrony and intersubjectivity during dialogue?

What would co-constructed identity between human and model dialogue look like?

Is embodied interaction necessary for language meaning and genuine agency?

Why do language models reinforce false assumptions instead of correcting them?

Is model self-awareness based on genuine introspection or pattern matching?

Can LLM personas constitute genuine psychology or remain linguistic role-play?

Do language models learn genuine linguistic structure or just surface patterns?

What distinguishes character simulation from authentic voice in language model outputs?

How does rhetorical adaptation affect LLM persuasion and detectability?

Can LLMs distinguish between surface requests and underlying mental states in dialogue?

How should dialogue systems represent uncertainty from noisy speech input?

Can dialogue agents be reliable but still feel inflexible or cold?

How can conversational AI maintain consistent personas across conversations?

What downstream consequences follow if dialogue agent personas are realized?

How do interface design choices shape consciousness attribution?

What would consciousness require that pure roleplay LLMs cannot provide?

How can persona representations reduce language model variance and improve task accuracy?

Does linguistic style or content richness matter more for persona authenticity?

How do formal dialogue structures reveal conversation coherence mechanisms?

How do contextual characteristics like emotional state shape dialogue authenticity?

How can LLM user simulators model realistic goal-driven conversation?

Where does the LLM interlocutor actually exist in the system?

Can AI systems balance emotional competence with factual reliability?

Why does effective empathy require deep character knowledge of the person?

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

13 direct connections · 79 in 2-hop network ·medium cluster Open in graph ↗

Does a language model have an authentic voice un… Are RLHF personas performed characters or realized… Does adversarial pressure reveal the difference be… Should we call LLM errors hallucinations or fabric…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Are RLHF personas performed characters or realized dispositions? Explores whether dialogue agent personas installed through post-training constitute genuine quasi-psychological states or remain sustained pretense. The distinction matters for how we understand what these systems fundamentally are.
Chalmers' direct counter-claim
Does adversarial pressure reveal the difference between pretense and realization? Can behavioral stickiness under adversarial pressure distinguish genuine mental states from performed ones? This matters because it's Chalmers' main criterion for deciding whether LLM personas are realized or merely simulated.
the behavioral criterion Chalmers uses against this position
Should we call LLM errors hallucinations or fabrications? Does the language we use to describe LLM failures shape the technical solutions we build? Examining whether perceptual and psychological frameworks misdiagnose what's actually happening.
parallel anti-anthropomorphism: fabrication framing also denies inner states

Does a language model have an authentic voice underneath?

Inquiring lines that read this note 26

Related concepts in this collection 3

Related papers in this collection 8

Search by related questions 4