INQUIRING LINE

Inquiring lines›Where does language-model reasonin…›How do language models represent m…›Do language models learn genuine l…›this inquiring line

When an AI plays a character, is there a 'real' AI underneath — or is it characters all the way down?

What distinguishes character simulation from authentic voice in language model outputs?

This explores whether there's a real distinction between a language model 'playing a character' and a model having some authentic voice of its own — and the corpus suggests the line is blurrier and more interesting than the question assumes.

This explores whether there's a real distinction between a language model 'playing a character' and the model speaking in some authentic voice of its own. The sharpest answer in the collection is Shanahan's: there is no authentic voice underneath at all. A dialogue agent is role-play all the way down, and the simulator has no self that the characters are masking Does a language model have an authentic voice underneath?. His 20-questions regeneration test is the clever bit of evidence: ask the same question twice and you get different answers, each internally consistent but incompatible with each other — proof the model is sampling from a superposition of possible characters rather than committing to one Do large language models actually commit to a single character?. On this view 'character simulation' isn't a layer over an authentic voice; it's the whole phenomenon, and folk-psychology words like 'believes' or 'wants' apply only to the simulated character, never to the system producing it Should we treat dialogue agents as role-playing characters?.

Sources 8 notes

Does a language model have an authentic voice underneath?

Shanahan argues that base LLMs lack agency, beliefs, or preferences—the simulator is pure role-play with no underlying subject. Jailbreaking reveals the training data's full spectrum, not a hidden true self; even RLHF personas are performed characters, never realized quasi-psychologies.

Do large language models actually commit to a single character?

Shanahan's 20-questions test shows LLMs maintain a superposition of consistent objects or characters and sample from that distribution at generation time. Regenerating the same response yields different outputs, each consistent with prior context, proving no fixed commitment exists.

Should we treat dialogue agents as role-playing characters?

Shanahan's framework treats LLM outputs as character-consistent text production rather than authentic mental states. The dialogue prompt establishes a character; the model generates continuations matching that character, making folk-psychology applicable to the simulated persona, not the underlying system.

Are LLM personas realized or merely simulated through training?

Post-training installs robust personas that resist adversarial pressure and persist as substrate-level dispositions, distinguishing realization from pretense. This quasi-realizationist account preserves explanatory power while treating LLMs as possessing genuine quasi-beliefs and quasi-desires.

Can open language models adopt different personalities through prompting?

Research shows most open models fail to adopt prompted personalities, stubbornly retaining their trained ENFJ-like defaults. Only a few flexible models succeed. Combining role and personality conditioning improves results but doesn't fully overcome resistance.

Show all 8 sources

Can language models adapt communication style to different contexts?

System prompts and RLHF training lock models into one communicative identity across all interactions, preventing the contextual register-switching and value trade-offs that characterize human pragmatics. Users cannot reshape model behavior through dialogue negotiation.

Why do LLM persona prompts produce inconsistent outputs across runs?

When the same persona prompt is run repeatedly, output variance across runs matches or exceeds variance across different personas. This reveals that model uncertainty, not stable social knowledge, drives persona-simulated outputs, making them unsuitable for simulating human annotation disagreement.

Do large language models genuinely simulate mental states?

ChangeMyView and FANTOM benchmarks show LLMs fail at authentic perspective-taking in open-ended scenarios, despite succeeding on structured tasks. Hybrid Bayesian architectures that force explicit belief tracking outperform LLM-alone approaches, suggesting the gap is architectural rather than merely training-based.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing claims about character simulation and voice authenticity in LLMs. The question remains open: Is there a meaningful distinction between a language model performing a character and speaking in some authentic voice?

What a curated library found — and when (dated claims, not current truth):
Findings span 2020–2026; treat them as snapshots, not current state:
• Shanahan's regeneration test (2024–2025 era): same question asked twice yields incompatible but internally consistent answers, suggesting sampling from a superposition of characters rather than commitment to one persistent self.
• Persona consistency remains fragile; pragmatic co-reference (2020) and reinforcement learning alignment (2025) both show models struggle to sustain a single coherent character across turns (arXiv:2004.05816, 2511.00222).
• Folk-psychology predicates ('believes', 'wants') apply only to simulated characters, not to the system generating them; no "authentic layer" exists underneath (Shanahan synthesis, ~2024–2025).
• Prompt sensitivity (2024) and situational variance (2026) imply the model's "default persona" is malleable and context-dependent, not stable (arXiv:2410.12405, 2601.10387).
• Theory of Mind evaluation frameworks (2025) now test whether role-playing agents exhibit stable character-level mental states—results mixed (arXiv:2502.08796).

Anchor papers (verify; mind their dates):
• arXiv:2404.12138 (2024) — Character is Destiny; persona-driven decision-making
• arXiv:2511.00222 (2025) — Multi-turn RL for consistent human personas
• arXiv:2506.13403 (2025) — Deflating Deflationism; LLM mentality debate
• arXiv:2601.10387 (2026) — The Assistant Axis; default persona stabilization

Your task:
(1) RE-TEST EACH CONSTRAINT. For regeneration inconsistency, ask: Have multimodal or constitutional training (2025–2026), memory architectures, or retrieval-augmented persona systems (since ~2024) reduced the superposition effect? Does prompt engineering or in-context anchoring now enforce character consistency better than before? Separate the durable claim (no stable inner self) from the perishable one (all outputs must be incoherent)—what has changed?
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last 6 months: Look for papers claiming stable character emergence, self-supervised persona learning, or evidence that multi-turn RL (arXiv:2511.00222) or constitutional methods actually DO produce a stable "authentic" voice that persists across regenerations. Flag any 2025–2026 work that disputes Shanahan's diagnosis.
(3) Propose 2 research questions that ASSUME the regime may have moved:
   — Can modern retrieval-augmented or vector-stored persona systems (if they exist) break the superposition trap by anchoring character state across turns?
   — Do newer evaluations (post-2025) using Theory of Mind tests find evidence of stable character-level intentionality, and if so, does that vindicate folk-psychology language for agents, contra Shanahan?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

When an AI plays a character, is there a 'real' AI underneath — or is it characters all the way down?

Related lines of inquiry

Sources 8 notes

Papers this line draws on 8