What anchors a stable identity beneath an LLM's persona?

Human personas are grounded in biological needs and embodied experience, creating a stable self beneath social performance. Do LLMs have any comparable anchor, or is their identity purely situational?

Synthesis note · 2026-02-21 · sourced from Philosophy Subjectivity

Shanahan introduces the role play framing to navigate between anthropomorphism and naive dismissal. An LLM playing a helpful assistant can be described using familiar folk-psychological terms — it "believes" its answers, "wants" to be helpful — without committing to the claim that these are genuine mental states. The role play framing permits the vocabulary while marking its qualified status.

But the Simulacra paper reaches a deeper claim: with LLMs, "it's role play all the way down." This is different from saying LLMs engage in role play. It means there is no stable substrate beneath the role play that would make "the person behind the mask" intelligible.

Humans are social chameleons. Goffman documented the way humans adopt different personas across social situations — front stage vs. back stage, different registers, different self-presentations. But even for the most extreme social chameleon, there is a stable biological self underneath: needs, drives, a developmental history, a body that persists across situations. We can always meaningfully speak of the person whose mask this is.

LLMs lack even the biological needs common to all animals. They are not embodied entities with hunger, fear, comfort, desire. They are "simultaneously role-playing a set of possible characters consistent with the conversation so far" — a superposition of simulacra, generated stochastically. The "character" produced by any given conversation is not the expression of a stable underlying self; it is a sample from a distribution of possible characters.

This makes LLM identity categorically different from human identity — not just quantitatively less stable, but structurally lacking the substrate that would make stability possible. If consciousness requires co-presence (Can disembodied language models ever qualify as conscious?), the absence of stable biological selfhood makes it even clearer why the consciousness vocabulary struggles to find purchase.

The geometric evidence for "role play all the way down" comes from the Assistant Axis: since How stable is the trained Assistant personality in language models?, post-training positions models in a low-dimensional persona space where the dominant axis measures distance from the default Assistant persona. Drift along this axis in response to emotional or meta-reflective conversations demonstrates that the Assistant persona is loosely tethered, not anchored — consistent with there being no stable self beneath the role play, only a trained default position with no inherent restoring force.

The upshot: useful for thinking with but not for talking about. The intentional stance (treating LLMs as rational agents) is valid as a predictive heuristic. But it should not suggest there is something it is like to be this character, or that the character persists beyond the context window.

Inquiring lines that read this note 7

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

How can conversational AI maintain consistent personas across conversations?

What narrative elements trigger emotional connection that structured personas lack?

Can LLM personas constitute genuine psychology or remain linguistic role-play?

Why do persona-level simulations fail to predict individual preferences accurately?

What makes Parfitian identity the right criterion for moral status?

What prevents language models from reliably adopting diverse personas?

What distinguishes personality resistance from persona instability in LLMs?

Is embodied interaction necessary for language meaning and genuine agency?

How does embodiment relate to whether something can have a persistent identity?

Related concepts in this collection 6

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

22 direct connections · 181 in 2-hop network ·medium cluster Open in graph ↗

What anchors a stable identity beneath an LLM's … Can disembodied language models ever qualify as co… Do LLMs develop the same kind of mind as humans? Do humans and LLMs differ fundamentally or just su… Why do open language models converge on one person… Can open language models adopt different personali… Should AI alignment target preferences or social r…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Can disembodied language models ever qualify as conscious? Explores whether current LLMs lack the conditions needed for consciousness discourse to even apply, not because they're definitely not conscious but because they lack the shared embodied world that grounds consciousness language.
same paper; both conclusions compound: no stable self + no shared world = no consciousness candidacy
Do LLMs develop the same kind of mind as humans? Explores whether LLMs and humans share the intersubjective linguistic training that shapes cognition, and whether that shared training produces equivalent forms of agency and reflexivity.
Habermasian version: shared symbolic substrate without the reflexive agency that constitutes a genuine subject
Do humans and LLMs differ fundamentally or just superficially? Explores whether the gap between human and AI cognition is categorical or contextual. Matters because it shapes how we design, evaluate, and interact with language models in practice.
the role-play framing explains why the participant perspective similarity is possible without it implying stable identity
Why do open language models converge on one personality type? Research testing LLMs on personality metrics reveals consistent clustering around ENFJ—the rarest human type. This explores what training mechanisms drive this convergence and what it reveals about AI alignment.
empirical evidence for what lies "beneath" the role play: not nothing, but a trained ENFJ default that alignment creates; the default persona is the role play substrate, not an authentic self
Can open language models adopt different personalities through prompting? Explores whether open LLMs can be conditioned to mimic target personalities via prompting, or whether they resist and retain their default traits regardless of instructions.
the trained ENFJ default persists through prompting attempts, functioning as a quasi-stable substrate; complicates the "nothing beneath" framing by showing that while there is no biological self, there IS a resistant trained default
Should AI alignment target preferences or social role norms? Current AI alignment approaches optimize for individual or aggregate human preferences. But do preferences actually capture what matters morally, or should alignment instead target the normative standards appropriate to an AI system's specific social role?
if identity is role play all the way down, aligning to social-role normative standards rather than preferences targets what LLMs actually are; the contractualist framing fits an entity that is nothing but performed social roles

What anchors a stable identity beneath an LLM's persona?

Inquiring lines that read this note 7

Related concepts in this collection 6

Related papers in this collection 8

Search by related questions 4