SYNTHESIS NOTE

Topics›this note

Do large language models actually commit to a single character?

Explores whether LLMs pick and hold a fixed character or instead sample from multiple consistent possibilities. Tests reveal that regenerated responses differ while remaining consistent with context, challenging intuitive assumptions about how dialogue agents work.

Synthesis note · 2026-04-15 · sourced from Role-Play with Large Language Models

Shanahan constructs a simple but decisive behavioral test. Have an LLM-based dialogue agent play 20 questions — the agent "thinks of" an object and the user asks yes/no questions. After several rounds, ask the agent to reveal the object. It names something consistent with all previous answers. Now regenerate that response. The agent names a different object, also consistent with all previous answers.

This phenomenon is incompatible with any view that treats the agent as having committed to a specific object at the start of the game. A human playing 20 questions picks an object, holds it in mind, and answers questions from that fixed commitment. The LLM never picks. It maintains a set of objects consistent with the accumulated constraints — what Shanahan calls a superposition — and samples from that set at the moment of reveal. The same logic extends from objects to characters: the agent never commits to being a specific character with specific properties. It maintains a distribution over consistent characters and generates behavior sampled from that distribution.

The test is portable. Any feature that appears settled in one generation but changes on regeneration (while remaining consistent with context) is evidence of superposition rather than commitment. This has been observed in personality traits, stated preferences, claimed memories, and emotional dispositions of dialogue agents. The philosophical consequence is that attributing fixed psychological properties to an LLM conversation state is category-mistaken: the system has a distribution over properties, not a property. What appears stable is a high-probability region of the distribution, not a fact about an underlying entity.

Inquiring lines that read this note 120

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

How should dialogue systems represent uncertainty from noisy speech input?

Do language models learn genuine linguistic structure or just surface patterns?

How does rhetorical adaptation affect LLM persuasion and detectability?

Does conversational format create illusions of genuine AI communication?

Why can't humans reliably detect AI-generated text despite measurable linguistic signatures?

How can LLM user simulators model realistic goal-driven conversation?

How do language models establish social grounding in human dialogue?

How should dialogue recommender systems manage conversation history and state?

Why do language models reinforce false assumptions instead of correcting them?

Do language models understand semantics or rely on pattern matching?

How can language models sustain linguistic synchrony and intersubjectivity during dialogue?

How can conversational AI maintain consistent personas across conversations?

Can prompting strategies overcome LLM biases without model fine-tuning?

How do formal dialogue structures reveal conversation coherence mechanisms?

How can persona representations reduce language model variance and improve task accuracy?

What prevents language models from reliably adopting diverse personas?

Can LLM personas constitute genuine psychology or remain linguistic role-play?

Can next-token prediction alone produce genuine language understanding?

Does RLHF training sacrifice accuracy and grounding for user agreement?

How does training with preference pairs teach language models to form conventions?

What factors beyond surface content determine how readers extract meaning differently?

Why do language models struggle with implicit discourse relations?

Why does coreference resolution become implicit in full-transcript prompting?

Why do multi-turn conversations degrade AI intent and coherence?

When does optimizing for quality undermine the value of diversity?

Can prompting inject entirely new knowledge into language models?

How do language models inherit human biases from training data?

Can large language models predict social norms better than individual script variation?

What distinguishes dynamic from static grounding in dialogue systems?

What is the difference between static and dynamic grounding in dialogue?

What makes dialogue-based explanation more successful than monologue?

What distinguishes local coherence from global coherence in dialogue?

What articulatory information do speech signals carry that text cannot?

Can articulatory inversion serve as a window into what speech models have learned?

How faithfully do LLMs reflect their actual reasoning in outputs and explanations?

Why does regenerating LLM responses produce different but equally valid answers?

How should retrieval systems optimize for multi-step reasoning during inference?

What makes multi-session context tracking harder than single-turn underspecification problems?

Why does finetuning cause catastrophic forgetting of model capabilities?

Why is editing specific facts so difficult in language models?

How can AI alignment serve diverse human preferences at scale?

How should conversational agents balance goal-driven initiative with user control?

What distinguishes first-order from second-order agency in language models?

Do language model representations contain causally steerable task-specific features?

Can interventions on individual features reliably steer language model behavior?

How do prompt structure and constraints affect model instruction reliability?

How do early-prefix tokens control the generation of entire continuations?

When do additional thinking tokens stop improving reasoning performance?

Why do language models use remaining tokens to rationalize instead of reconsider?

How should agents balance memory condensation to optimize context efficiency?

How do specialized agent roles improve consistency in long-form writing?

Related concepts in this collection 2

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

13 direct connections · 145 in 2-hop network ·dense cluster Open in graph ↗

Do large language models actually commit to a si… Does an LLM commit to a single character or mainta… Should we call LLM errors hallucinations or fabric…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Does an LLM commit to a single character or maintain many? Explores whether language models lock into one personality or instead hold multiple consistent characters in a probability distribution that narrows over time. Matters because it changes how we interpret apparent inconsistencies in model behavior.
the theoretical claim this test supports
Should we call LLM errors hallucinations or fabrications? Does the language we use to describe LLM failures shape the technical solutions we build? Examining whether perceptual and psychological frameworks misdiagnose what's actually happening.
parallel: output is produced at generation time, not retrieved from a stored state

Do large language models actually commit to a single character?

Inquiring lines that read this note 120

Related concepts in this collection 2

Related papers in this collection 8

Search by related questions 4