Why do LLM persona annotations become unstable when run multiple times?
This explores why running the same LLM persona prompt repeatedly produces different annotations — and whether that instability is a fixable bug or a window into how persona simulation actually works.
This explores why running the same LLM persona prompt repeatedly produces different annotations, and what that instability reveals. The sharpest answer in the corpus is also the most deflating: when you run the same persona prompt over and over, the variation between runs matches or exceeds the variation between *different* personas Why do LLM persona prompts produce inconsistent outputs across runs?. In other words, the noise of re-rolling the same character is as large as the signal that supposedly separates one character from another. That's the tell — the output is being driven by the model's own uncertainty, not by stable social knowledge about who the persona is.
There's a clean mechanical reason for this. An LLM doesn't 'become' a single character when you prompt it; it holds a probability distribution over many characters consistent with the prompt, and each generation samples from that cloud Does an LLM commit to a single character or maintain many?. So regeneration isn't error correction — it's drawing a fresh card from a deck. When the persona description is thin, the deck is wide, and the cards you draw look wildly different from each other. This is why instability gets worse exactly where you'd want it to get better: with sparse persona information, the prompt simply doesn't carry enough predictive power to narrow the distribution, and the model is forced to guess Why do LLM judges fail at predicting sparse user preferences?.
What surprises many people is that this isn't something bigger models grow out of. Persona adherence turns out to be roughly orthogonal to raw capability — a far stronger model bought almost no improvement in consistency, because standard training optimizes for sounding good turn-by-turn, not for staying the same character across turns Does model capability translate to better persona consistency?. And even the 'self' the model defaults to drifts: there's a dominant axis in persona space measuring distance from the baseline Assistant, and emotional or self-reflective prompts predictably push the model along it How stable is the trained Assistant personality in language models?. The ground you're standing on is already sloped.
The instability also connects to a deeper limit worth knowing about: conditioning on a profile often doesn't actually individuate the model at all. Across 200,000+ participants, giving an LLM a person's profile produced no meaningful gain in predicting that specific person Does conditioning LLMs on personal profiles improve prediction?. So the cross-run wobble isn't just random — it's the visible symptom of the model never having locked onto a real individual in the first place. Scale this up and the biases become systematic rather than merely noisy, which is why population-scale persona simulation needs genuine calibration science instead of clever prompting How do we generate realistic personas at population scale?.
The constructive thread across the corpus is that the fix isn't more determinism — it's a richer anchor. Static three-to-five-sentence persona lists are the worst offenders, producing repetitive and self-contradictory output, while personas grounded in authentic self-expression (like journal entries) or extracted from real domain documents hold together far better across runs Why do static persona descriptions produce repetitive dialogue? Can personas extracted from documents generalize across evaluation tasks?. The lesson hiding in the instability: a persona prompt isn't an instruction the model obeys, it's a constraint that narrows a distribution — and a thin constraint leaves the model free to be a different person every time you ask.
Sources 9 notes
When the same persona prompt is run repeatedly, output variance across runs matches or exceeds variance across different personas. This reveals that model uncertainty, not stable social knowledge, drives persona-simulated outputs, making them unsuitable for simulating human annotation disagreement.
Research shows LLMs don't commit to a single character but instead maintain a probability distribution over many consistent simulacra. Each response samples from this distribution, explaining why regenerations can yield different personalities while remaining consistent with prior context.
Sparse persona information lacks predictive power for specific preferences, causing LLM judges to fail. Verbal uncertainty estimation recovers reliability above 80% on high-certainty samples by allowing abstention rather than forced judgment.
Claude 3.5 Sonnet achieved only 2.97% improvement over GPT 3.5 on persona consistency despite massive capability gaps, suggesting persona adherence is orthogonal to model scaling. Standard training objectives optimize for per-turn quality, not cross-turn coherence.
Research mapping hundreds of character archetypes reveals a low-dimensional persona space where the leading component measures distance from the default Assistant. Emotional and meta-reflective conversations cause predictable drift, but activation capping along this axis mitigates harmful shifts without degrading capabilities.
Across 208,021 participants in the Psych-201 dataset, conditioning LLMs on participant profiles did not meaningfully improve predictions for specific individuals. The standard technique for individuation produces no measurable gains in person-level forecasting.
LLM persona generation produces systematic biases in downstream tasks like election forecasting because it relies on heuristic techniques that cannot recover true joint distributions from marginal data. Solving this requires benchmarks, training datasets, and structured frameworks analogous to ImageNet.
Journal entries capturing Big Five traits through genuine self-expression produce more consistent and nuanced dialogue than predefined 3-5 sentence persona descriptions. Personality emerges from how people express themselves, not from attribute inventories.
MAJ-EVAL automatically extracts stakeholder personas from domain documents via semantic clustering and orchestrates structured three-phase debate, achieving reproducible evaluation that transfers across tasks like summarization and dialogue without manual redesign. The approach grounds personas in real stakeholder perspectives rather than arbitrary roles.