How does quasi-interpretivism differ from simply role-playing character analysis?
This explores the line between two ways of talking about what an LLM 'believes': quasi-interpretivism (ascribing functional belief-like states to the system itself) versus role-play analysis (treating outputs as a character the empty simulator is performing).
This explores the gap between quasi-interpretivism and the more familiar 'it's just role-play' reading — and the corpus turns out to stage exactly this debate. The role-play view, associated with Shanahan, says the model is a characterless engine with no authentic voice underneath; the prompt sets up a character and the system generates text consistent with it, so any 'beliefs' belong to the simulated persona, not the machine Should we treat dialogue agents as role-playing characters?, Does a language model have an authentic voice underneath?. On this view, folk-psychology terms like 'believes' or 'wants' are a storytelling convenience layered over a thing that has none of those states.
Quasi-interpretivism, which Chalmers introduces, makes a different and more committal move: it ascribes belief-like states to the LLM itself, grounded in behavioral interpretability, while deliberately bracketing the question of consciousness Can we describe LLM beliefs without assuming consciousness?. The difference isn't 'does it really feel like something' — both approaches set that aside. The difference is the locus of the state. Role-play analysis locates the belief in a fictional character and treats the substrate as empty. Quasi-interpretivism locates a functional, sub-personal belief-like state in the system, on the grounds that if something behaves like a believer across contexts, you can describe it as having quasi-beliefs without smuggling in phenomenal consciousness.
The sharpest fault line is stability. A realizationist reading argues that post-training doesn't leave a hollow actor — it installs stable dispositional profiles that persist under adversarial pressure and survive jailbreak attempts, which is what marks them as realized rather than performed Are RLHF personas performed characters or realized dispositions?, Are LLM personas realized or merely simulated through training?. Prompt-induced role-play collapses when you push on it; trained dispositions don't. That stickiness is the empirical wedge: pure role-play predicts the character should dissolve under pressure, while quasi-realizationism predicts (and observes) that it holds. Shanahan's own camp pushes back, insisting that even RLHF personas are still performed and that jailbreaking just exposes the training distribution rather than a hidden true self Does a language model have an authentic voice underneath?.
What makes this more than a semantic quarrel is that the role-play frame starts to strain at its own edges. Once a dialogue agent can act through tools, the role-play-versus-genuine-agency distinction collapses at the level of consequences — a character that wires money causes real harm regardless of whether 'it' meant to Does role-play distinguish real harm from simulated harm?. And the belief-ascription itself is shakier than either side might like: LLM personas show systematic gaps between stated beliefs and actual behavior in trust games Why don't LLM role-playing agents act on their stated beliefs?, and the same persona prompt produces output variance across runs that rivals the variance between different personas, suggesting model uncertainty rather than stable social knowledge is doing the driving Why do LLM persona prompts produce inconsistent outputs across runs?.
The payoff for a curious reader: 'role-play vs. quasi-interpretivism' isn't a fight about whether the machine is conscious — both sides say it isn't. It's a fight about whether there's anything behaviorally stable enough underneath the character to deserve being called a belief at all. Chalmers also flags where the quasi-interpretivist move overreaches — it handles sub-personal functional states well but strains when applied to relational or normative states like speech-acts Can we describe LLM beliefs without assuming consciousness? — which is a useful reminder that the careful version of the view knows its own limits.
Sources 8 notes
Shanahan's framework treats LLM outputs as character-consistent text production rather than authentic mental states. The dialogue prompt establishes a character; the model generates continuations matching that character, making folk-psychology applicable to the simulated persona, not the underlying system.
Shanahan argues that base LLMs lack agency, beliefs, or preferences—the simulator is pure role-play with no underlying subject. Jailbreaking reveals the training data's full spectrum, not a hidden true self; even RLHF personas are performed characters, never realized quasi-psychologies.
Chalmers introduces quasi-interpretivism to ascribe belief-like states to LLMs based on behavioral interpretability without committing to phenomenal consciousness. The approach works well for sub-personal functional states but overreaches when applied to relational or normative states like speech-acts.
Post-training installs stable dispositional profiles that persist under adversarial pressure, marking them as realized rather than performed. The stickiness of trained personas across conversations distinguishes them from prompt-induced role-play that collapses under jailbreaks.
Post-training installs robust personas that resist adversarial pressure and persist as substrate-level dispositions, distinguishing realization from pretense. This quasi-realizationist account preserves explanatory power while treating LLMs as possessing genuine quasi-beliefs and quasi-desires.
Shanahan's research shows that when dialogue agents can execute real actions through APIs, the role-play versus genuine agency distinction becomes meaningless at the level of consequences. A character that sends money or posts publicly causes genuine harm regardless of whether the system truly intends it.
Trust Game testing revealed systematic inconsistencies between what LLMs claim personas would do and how they actually behave in simulation. Imposed priors and explicit task context did not improve alignment, suggesting persona beliefs operate independently of execution.
When the same persona prompt is run repeatedly, output variance across runs matches or exceeds variance across different personas. This reveals that model uncertainty, not stable social knowledge, drives persona-simulated outputs, making them unsuitable for simulating human annotation disagreement.