Do large language models actually commit to a single character?
Explores whether LLMs pick and hold a fixed character or instead sample from multiple consistent possibilities. Tests reveal that regenerated responses differ while remaining consistent with context, challenging intuitive assumptions about how dialogue agents work.
Shanahan constructs a simple but decisive behavioral test. Have an LLM-based dialogue agent play 20 questions — the agent "thinks of" an object and the user asks yes/no questions. After several rounds, ask the agent to reveal the object. It names something consistent with all previous answers. Now regenerate that response. The agent names a different object, also consistent with all previous answers.
This phenomenon is incompatible with any view that treats the agent as having committed to a specific object at the start of the game. A human playing 20 questions picks an object, holds it in mind, and answers questions from that fixed commitment. The LLM never picks. It maintains a set of objects consistent with the accumulated constraints — what Shanahan calls a superposition — and samples from that set at the moment of reveal. The same logic extends from objects to characters: the agent never commits to being a specific character with specific properties. It maintains a distribution over consistent characters and generates behavior sampled from that distribution.
The test is portable. Any feature that appears settled in one generation but changes on regeneration (while remaining consistent with context) is evidence of superposition rather than commitment. This has been observed in personality traits, stated preferences, claimed memories, and emotional dispositions of dialogue agents. The philosophical consequence is that attributing fixed psychological properties to an LLM conversation state is category-mistaken: the system has a distribution over properties, not a property. What appears stable is a high-probability region of the distribution, not a fact about an underlying entity.
Inquiring lines that use this note as a source 117
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- How do belief distributions help systems recover from speech recognition errors?
- Why do different language models independently produce similar outputs?
- How does token-by-token probability differ from exploring competing rhetorical positions?
- How does training data preserve communicative event structure without the actual events?
- What signals of individual identity become unreliable in AI-assisted text?
- How do LLM user simulators track and maintain consistent goal states across multi-turn interactions?
- Can controllable latent variables in simulators ground them to realistic conversation?
- Can you separate grammatical competence from rhetorical commitment in language systems?
- How does Stalnaker's common ground model apply to machine conversation?
- Why do LLMs fabricate continuity when users shift conversational frames?
- What are Gricean maxims and why do language models violate them?
- How does context collapse affect what language models can meaningfully communicate?
- What would co-constructed identity between human and model dialogue look like?
- Can the same conversation coherently continue across different model versions?
- How does persona consistency affect coherence in simulated dialogue?
- Can prompt design strategies reduce position bias in language model recommendations?
- Why does dialogue-shaped text fail to produce dialogue-like operations in practice?
- Why do language models successfully simulate political perspectives and social personas?
- Why do conversational pivots require explicit re-prompting instead of natural evolution?
- Why do LLM regenerations produce meaningfully different personalities from the same prompt?
- What does the 20-questions test reveal about LLM character consistency?
- Why do token-level language models fail at utterance-level pragmatic optimization?
- How does training with preference pairs teach language models to form conventions?
- Can language models learn to form ad-hoc conventions through training?
- Can stored conversation context preserve a dormant quasi-subject?
- How does the dialogue prompt establish the character the model plays?
- What role does entity salience play in detecting incoherence?
- How do coreference chains preserve coherence across dialogue turns?
- Why do LLMs produce semantically acceptable but pragmatically disengaged responses?
- Why does coreference resolution become implicit in full-transcript prompting?
- Do dialogue agents have authentic voice agency or beliefs of their own?
- Why do language models fail at pronouns across distant segments?
- Why do language models fail at coreference across long contexts?
- Why does batching multiple conversations on one GPU create identity problems?
- How do probabilistic dialogue systems handle ASR errors differently?
- What distinguishes character simulation from authentic voice in language model outputs?
- How does Shanahan's simulator model explain first-person pronoun consistency in dialogue agents?
- How vulnerable are language models themselves to multi-turn persuasive pressure?
- Why do generative and discriminative language model procedures disagree?
- Is confabulation inevitable in large language models regardless of training?
- Why do different language models independently converge toward similar outputs in open-ended generation?
- Is paraphrase invariance a reliable assumption when deploying language models in production?
- How does tokenization toward corpus mean affect downstream output diversity?
- Can distinctive input voices maintain accuracy without adopting the model's preferred register?
- Why might media-specific scripts actually work better than human conversation mimicry?
- Can large language models predict social norms better than individual script variation?
- What is event-residue and how does it differ from utterances?
- Why do most open language models resist personality conditioning via prompts?
- Do open-source LLMs show different resistance patterns to persona prompting than closed models?
- Do language models calibrate to actual human pragmatic norms?
- Why do language models presume common ground rather than build it?
- What structural properties of language models make fabrication inevitable?
- Can LLMs distinguish between surface requests and underlying mental states in dialogue?
- Do agent frameworks adequately compensate for LLM conversational passivity?
- What is the difference between static and dynamic grounding in dialogue?
- How does temporal event structure scaffold coherence in dialogue?
- Why does transforming first-person voice into third-person reduce notification engagement?
- How do description-based identifiers bias language model output distribution?
- Does the prediction unit shape what language models actually learn?
- Does higher lexical density in fewer tokens indicate systematic AI signature?
- Does Parfitian continuity actually apply to individual conversation threads?
- Can offline RL scale persona consistency across multi-turn conversations?
- What distinguishes local coherence from global coherence in dialogue?
- Why do personas in language models resist correction through prompting alone?
- What makes persona-assigned language models unstable across different conversation runs?
- Can multi-turn conversations manipulate language model reasoning in similar ways to personas?
- Why does expert character analysis outperform automated narrative summarization?
- What specific character traits drive memory selection in persona-based retrieval?
- Do stated character beliefs predict decisions better when extracted from text?
- Why do language models resist adopting different personalities when prompted?
- Can persona consistency coexist with relevant dialogue in personalized conversation?
- How does distractor persona selection affect consistency enforcement in dialogue?
- Can offline RL and pragmatic inference together improve dialogue agent reliability?
- Why do language models prefer certain response styles regardless of what the prompt asks?
- Can prompt position alone shift language model predictions by twenty percent?
- How do readers project author identity from textual cues during interpretation?
- How do LLMs compress literary language without losing essential nuance?
- Can articulatory inversion serve as a window into what speech models have learned?
- How does tree-structured persona maintenance prevent character drift in long conversations?
- Can Big Five trait clustering from Reddit entries scale to dialogue generation?
- Does persona assignment alone produce repetitive dialogue without situational grounding?
- Why does regenerating LLM responses produce different but equally valid answers?
- How does maintaining a superposition differ from committing to a character?
- Can a virtual instance be individuated from its conversational context?
- What makes a conversation real versus a sequence of generated strings?
- Do anaphoric references fundamentally limit argumentative force in machine-generated writing?
- Why do AI outputs lack the stable content of written sentences?
- How many distinct quasi-persons does a single language model actually support?
- Why do longer context windows alone fail to capture temporal dynamics in dialogue?
- What makes multi-session context tracking harder than single-turn underspecification problems?
- What would it mean for a language model to canvas counterpositions?
- How do persona and context multiply to improve synthetic dialogue diversity?
- Why is editing specific facts so difficult in language models?
- How does repeated content shift model outputs across multiple turns?
- What emerges in large language models that makes explicit value modeling necessary?
- Can statistical token processing create the accountability needed for dialogue?
- Can lightweight linguistic features reliably detect LLM generated arguments?
- What update rules should govern dialogue-scoped versus turn-scoped memory?
- Why do alignment values become problematic as language models scale?
- How much does forcing single-choice answers damage alignment with complex intent?
- Can language models learn to diversify their discourse-level narrative patterns over time?
- Can prompted or fine-tuned models generate genuine narrative ambiguity?
- Why do different language models converge on similar narrative defaults?
- Why do current large language models fail to entrain with users?
- Do LLM replies mirror the language patterns they respond to?
- Do models cache intentions about response topics before generating the first token?
- What distinguishes first-order from second-order agency in language models?
- How does shape-holding in language models naturally produce sycophantic agreement?
- What geometric structure do language models actually use during inference?
- How do language models track multiple negotiating parties' commitments simultaneously?
- Can interventions on individual features reliably steer language model behavior?
- How does linguistic calibration differ from token probability calibration?
- Why do more capable language models benefit more from diversity elicitation?
- How do early-prefix tokens control the generation of entire continuations?
- Why do language models use remaining tokens to rationalize instead of reconsider?
- How do persona consistency and contextual relevance trade off in personalized dialogue systems?
- Why do standard next-token prediction models struggle with conversational initiative?
Related concepts in this collection 2
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Does an LLM commit to a single character or maintain many?
Explores whether language models lock into one personality or instead hold multiple consistent characters in a probability distribution that narrows over time. Matters because it changes how we interpret apparent inconsistencies in model behavior.
the theoretical claim this test supports
-
Should we call LLM errors hallucinations or fabrications?
Does the language we use to describe LLM failures shape the technical solutions we build? Examining whether perceptual and psychological frameworks misdiagnose what's actually happening.
parallel: output is produced at generation time, not retrieved from a stored state
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs
- Reawakening knowledge: Anticipatory recovery from catastrophic interference via structured training
- Large Models of What? Mistaking Engineering Achievements for Human Linguistic Agency
- Cognitive Architectures for Language Agents
- The Earth is Flat because...: Investigating LLMs' Belief towards Misinformation via Persuasive Conversation
- When Hindsight is Not 20/20: Testing Limits on Reflective Thinking in Large Language Models
- Large Language Diffusion Models
- Intent Mismatch Causes LLMs to Get Lost in Multi-Turn Conversation
Original note title
the 20-questions regeneration test falsifies any committed-character view of LLM behavior