SYNTHESIS NOTE
Psychology, Society, and Alignment

Do large language models actually commit to a single character?

Explores whether LLMs pick and hold a fixed character or instead sample from multiple consistent possibilities. Tests reveal that regenerated responses differ while remaining consistent with context, challenging intuitive assumptions about how dialogue agents work.

Synthesis note · 2026-04-15 · sourced from Role-Play with Large Language Models
What kind of thing is an LLM really?

Shanahan constructs a simple but decisive behavioral test. Have an LLM-based dialogue agent play 20 questions — the agent "thinks of" an object and the user asks yes/no questions. After several rounds, ask the agent to reveal the object. It names something consistent with all previous answers. Now regenerate that response. The agent names a different object, also consistent with all previous answers.

This phenomenon is incompatible with any view that treats the agent as having committed to a specific object at the start of the game. A human playing 20 questions picks an object, holds it in mind, and answers questions from that fixed commitment. The LLM never picks. It maintains a set of objects consistent with the accumulated constraints — what Shanahan calls a superposition — and samples from that set at the moment of reveal. The same logic extends from objects to characters: the agent never commits to being a specific character with specific properties. It maintains a distribution over consistent characters and generates behavior sampled from that distribution.

The test is portable. Any feature that appears settled in one generation but changes on regeneration (while remaining consistent with context) is evidence of superposition rather than commitment. This has been observed in personality traits, stated preferences, claimed memories, and emotional dispositions of dialogue agents. The philosophical consequence is that attributing fixed psychological properties to an LLM conversation state is category-mistaken: the system has a distribution over properties, not a property. What appears stable is a high-probability region of the distribution, not a fact about an underlying entity.

Inquiring lines that use this note as a source 117

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 2

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
13 direct connections · 144 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

the 20-questions regeneration test falsifies any committed-character view of LLM behavior