Does an LLM commit to a single character or maintain many?
Explores whether language models lock into one personality or instead hold multiple consistent characters in a probability distribution that narrows over time. Matters because it changes how we interpret apparent inconsistencies in model behavior.
The simple role-play metaphor — one actor, one part — is too rigid for what LLMs actually do. Shanahan refines it using Janus's simulator framing: the LLM is a non-deterministic simulator capable of generating an infinity of characters (simulacra), and at any point during a conversation it maintains a superposition of simulacra consistent with the preceding context. The superposition narrows as the conversation proceeds: each new turn rules out characters inconsistent with what has been said, concentrating probability on an ever-smaller set.
The distributional view is more than a refinement — it changes the ontological picture. Under simple role-play, there is one character the system is playing, and the question is what that character's properties are. Under the superposition view, there is no single character until the conversation has proceeded far enough to collapse the distribution to near-determinacy. The system is simultaneously consistent with many characters, and the character that appears in any particular generation is a sample from the current distribution, not a reveal of a committed identity.
This explains observable phenomena that the single-character view cannot. When a user regenerates the model's output, the second generation may present a meaningfully different personality, stance, or knowledge state — while remaining consistent with the conversation so far. The system did not change its mind; it sampled a different point from the distribution. The 20-questions test formalizes this: the agent never "thought of" an object; it maintained a set of objects consistent with prior answers and generated one on the fly at the reveal, and will generate a different consistent one if asked again.
Inquiring lines that use this note as a source 48
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Why do LLMs fabricate continuity when users shift conversational frames?
- How does psychological continuity theory apply to identity across LLM conversation threads?
- Can the same conversation coherently continue across different model versions?
- Can one model instance host multiple realized personas simultaneously?
- Why do LLM personas struggle with specificity in specialized domains like law?
- Why do LLM regenerations produce meaningfully different personalities from the same prompt?
- What does the 20-questions test reveal about LLM character consistency?
- Can distributional views explain when an LLM appears to change its mind?
- Is interpretive multiplicity a bug in language or a feature?
- Do LLM judges with diverse personas resist individual biases better than single evaluators?
- How many concurrent moral patients does one language model support?
- What does McDonald's omega reveal about LLM judgment consistency?
- How does the dialogue prompt establish the character the model plays?
- What property must remain constant to individuate an LLM across infrastructure changes?
- How does reasoning instability prevent models from modeling individuals?
- How do alignment constraints affect whether LLMs show emotional flexibility?
- How does persona instability in annotation compare to LLM overconfidence in low-resource domains?
- Why do some open models resist personality conditioning while others don't?
- How does model capability relate to personality conditioning flexibility?
- What distinguishes personality resistance from persona instability in LLMs?
- Does Parfitian continuity actually apply to individual conversation threads?
- Why do models resist personality change despite sophisticated prompting techniques?
- How does RLHF-induced mode collapse limit diversity in LLM-generated personas?
- Do personality traits occupy consistent geometric structures across different LLM architectures?
- Why do language models resist adopting different personalities when prompted?
- What causes different personality traits to trigger different emoji densities in generated text?
- Why is persona consistency a pragmatic property rather than semantic?
- How does embodiment relate to whether something can have a persistent identity?
- Why do aligned models struggle with deceptive character traits more than cruelty?
- Why do models confabulate inconsistently across different samples?
- How does semantic entanglement interact with personality dimension shifts during finetuning?
- How do LLMs compress literary language without losing essential nuance?
- How does maintaining a superposition differ from committing to a character?
- Can we detect superposition in LLM personality traits and stated preferences?
- Why do models lack a stable underlying identity to return to?
- How do personality and language proficiency moderate the impact of linguistic alignment?
- How does model weight freezing across users affect virtual instance individuation?
- Why do LLM persona annotations become unstable when run multiple times?
- Does alignment training intensity push LLM personas from pretense toward realization?
- How many distinct quasi-persons does a single language model actually support?
- Why do LLMs succeed at social roles without a stable self?
- Why does persona assignment make it harder for models to hold values in tension?
- Why do different language models converge on similar narrative defaults?
- How can multiple conflicting values coexist in a single LLM system?
- Should LLMs align with social roles instead of individual preferences?
- Do LLM replies mirror the language patterns they respond to?
- Why do LLM stories over-explain themes and favor single-track plots?
- Does the alignment frame mislead us about what LLM problems actually are?
Related concepts in this collection 2
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Do large language models actually commit to a single character?
Explores whether LLMs pick and hold a fixed character or instead sample from multiple consistent possibilities. Tests reveal that regenerated responses differ while remaining consistent with context, challenging intuitive assumptions about how dialogue agents work.
the empirical demonstration of superposition
-
Should we treat dialogue agents as role-playing characters?
Does the role-play framing successfully avoid anthropomorphism while preserving folk-psychological vocabulary for describing LLM behavior? This matters because it shapes whether we attribute genuine mental states to dialogue systems.
the simple role-play view this refines
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Role-Play with Large Language Models
- Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning
- What we talk to when we talk to language models
- PersLLM: A Personified Training Approach for Large Language Models
- Learning Pluralistic User Preferences through Reinforcement Learning Fine-tuned Summaries
- Role play with large language models
- The Earth is Flat because...: Investigating LLMs' Belief towards Misinformation via Persuasive Conversation
- From Human to Machine Psychology: A Conceptual Framework for Understanding Well-Being in Large Language Models
Original note title
an LLM is a non-deterministic simulator that maintains a superposition of simulacra rather than committing to a single character