What distinguishes character simulation from authentic voice in language model outputs?
This explores whether there's a real distinction between a language model 'playing a character' and a model having some authentic voice of its own — and the corpus suggests the line is blurrier and more interesting than the question assumes.
This explores whether there's a real distinction between a language model 'playing a character' and the model speaking in some authentic voice of its own. The sharpest answer in the collection is Shanahan's: there is no authentic voice underneath at all. A dialogue agent is role-play all the way down, and the simulator has no self that the characters are masking Does a language model have an authentic voice underneath?. His 20-questions regeneration test is the clever bit of evidence: ask the same question twice and you get different answers, each internally consistent but incompatible with each other — proof the model is sampling from a superposition of possible characters rather than committing to one Do large language models actually commit to a single character?. On this view 'character simulation' isn't a layer over an authentic voice; it's the whole phenomenon, and folk-psychology words like 'believes' or 'wants' apply only to the simulated character, never to the system producing it Should we treat dialogue agents as role-playing characters?.
Sources 8 notes
Shanahan argues that base LLMs lack agency, beliefs, or preferences—the simulator is pure role-play with no underlying subject. Jailbreaking reveals the training data's full spectrum, not a hidden true self; even RLHF personas are performed characters, never realized quasi-psychologies.
Shanahan's 20-questions test shows LLMs maintain a superposition of consistent objects or characters and sample from that distribution at generation time. Regenerating the same response yields different outputs, each consistent with prior context, proving no fixed commitment exists.
Shanahan's framework treats LLM outputs as character-consistent text production rather than authentic mental states. The dialogue prompt establishes a character; the model generates continuations matching that character, making folk-psychology applicable to the simulated persona, not the underlying system.
Post-training installs robust personas that resist adversarial pressure and persist as substrate-level dispositions, distinguishing realization from pretense. This quasi-realizationist account preserves explanatory power while treating LLMs as possessing genuine quasi-beliefs and quasi-desires.
Research shows most open models fail to adopt prompted personalities, stubbornly retaining their trained ENFJ-like defaults. Only a few flexible models succeed. Combining role and personality conditioning improves results but doesn't fully overcome resistance.
System prompts and RLHF training lock models into one communicative identity across all interactions, preventing the contextual register-switching and value trade-offs that characterize human pragmatics. Users cannot reshape model behavior through dialogue negotiation.
When the same persona prompt is run repeatedly, output variance across runs matches or exceeds variance across different personas. This reveals that model uncertainty, not stable social knowledge, drives persona-simulated outputs, making them unsuitable for simulating human annotation disagreement.
ChangeMyView and FANTOM benchmarks show LLMs fail at authentic perspective-taking in open-ended scenarios, despite succeeding on structured tasks. Hybrid Bayesian architectures that force explicit belief tracking outperform LLM-alone approaches, suggesting the gap is architectural rather than merely training-based.