INQUIRING LINE

Why do LLM persona annotations become unstable when run multiple times?

This explores why running the same LLM persona prompt repeatedly produces different annotations — and whether that instability is a fixable bug or a window into how persona simulation actually works.


This explores why running the same LLM persona prompt repeatedly produces different annotations, and what that instability reveals. The sharpest answer in the corpus is also the most deflating: when you run the same persona prompt over and over, the variation between runs matches or exceeds the variation between *different* personas Why do LLM persona prompts produce inconsistent outputs across runs?. In other words, the noise of re-rolling the same character is as large as the signal that supposedly separates one character from another. That's the tell — the output is being driven by the model's own uncertainty, not by stable social knowledge about who the persona is.

There's a clean mechanical reason for this. An LLM doesn't 'become' a single character when you prompt it; it holds a probability distribution over many characters consistent with the prompt, and each generation samples from that cloud Does an LLM commit to a single character or maintain many?. So regeneration isn't error correction — it's drawing a fresh card from a deck. When the persona description is thin, the deck is wide, and the cards you draw look wildly different from each other. This is why instability gets worse exactly where you'd want it to get better: with sparse persona information, the prompt simply doesn't carry enough predictive power to narrow the distribution, and the model is forced to guess Why do LLM judges fail at predicting sparse user preferences?.

What surprises many people is that this isn't something bigger models grow out of. Persona adherence turns out to be roughly orthogonal to raw capability — a far stronger model bought almost no improvement in consistency, because standard training optimizes for sounding good turn-by-turn, not for staying the same character across turns Does model capability translate to better persona consistency?. And even the 'self' the model defaults to drifts: there's a dominant axis in persona space measuring distance from the baseline Assistant, and emotional or self-reflective prompts predictably push the model along it How stable is the trained Assistant personality in language models?. The ground you're standing on is already sloped.

The instability also connects to a deeper limit worth knowing about: conditioning on a profile often doesn't actually individuate the model at all. Across 200,000+ participants, giving an LLM a person's profile produced no meaningful gain in predicting that specific person Does conditioning LLMs on personal profiles improve prediction?. So the cross-run wobble isn't just random — it's the visible symptom of the model never having locked onto a real individual in the first place. Scale this up and the biases become systematic rather than merely noisy, which is why population-scale persona simulation needs genuine calibration science instead of clever prompting How do we generate realistic personas at population scale?.

The constructive thread across the corpus is that the fix isn't more determinism — it's a richer anchor. Static three-to-five-sentence persona lists are the worst offenders, producing repetitive and self-contradictory output, while personas grounded in authentic self-expression (like journal entries) or extracted from real domain documents hold together far better across runs Why do static persona descriptions produce repetitive dialogue? Can personas extracted from documents generalize across evaluation tasks?. The lesson hiding in the instability: a persona prompt isn't an instruction the model obeys, it's a constraint that narrows a distribution — and a thin constraint leaves the model free to be a different person every time you ask.


Sources 9 notes

Why do LLM persona prompts produce inconsistent outputs across runs?

When the same persona prompt is run repeatedly, output variance across runs matches or exceeds variance across different personas. This reveals that model uncertainty, not stable social knowledge, drives persona-simulated outputs, making them unsuitable for simulating human annotation disagreement.

Does an LLM commit to a single character or maintain many?

Research shows LLMs don't commit to a single character but instead maintain a probability distribution over many consistent simulacra. Each response samples from this distribution, explaining why regenerations can yield different personalities while remaining consistent with prior context.

Why do LLM judges fail at predicting sparse user preferences?

Sparse persona information lacks predictive power for specific preferences, causing LLM judges to fail. Verbal uncertainty estimation recovers reliability above 80% on high-certainty samples by allowing abstention rather than forced judgment.

Does model capability translate to better persona consistency?

Claude 3.5 Sonnet achieved only 2.97% improvement over GPT 3.5 on persona consistency despite massive capability gaps, suggesting persona adherence is orthogonal to model scaling. Standard training objectives optimize for per-turn quality, not cross-turn coherence.

How stable is the trained Assistant personality in language models?

Research mapping hundreds of character archetypes reveals a low-dimensional persona space where the leading component measures distance from the default Assistant. Emotional and meta-reflective conversations cause predictable drift, but activation capping along this axis mitigates harmful shifts without degrading capabilities.

Does conditioning LLMs on personal profiles improve prediction?

Across 208,021 participants in the Psych-201 dataset, conditioning LLMs on participant profiles did not meaningfully improve predictions for specific individuals. The standard technique for individuation produces no measurable gains in person-level forecasting.

How do we generate realistic personas at population scale?

LLM persona generation produces systematic biases in downstream tasks like election forecasting because it relies on heuristic techniques that cannot recover true joint distributions from marginal data. Solving this requires benchmarks, training datasets, and structured frameworks analogous to ImageNet.

Why do static persona descriptions produce repetitive dialogue?

Journal entries capturing Big Five traits through genuine self-expression produce more consistent and nuanced dialogue than predefined 3-5 sentence persona descriptions. Personality emerges from how people express themselves, not from attribute inventories.

Can personas extracted from documents generalize across evaluation tasks?

MAJ-EVAL automatically extracts stakeholder personas from domain documents via semantic clustering and orchestrates structured three-phase debate, achieving reproducible evaluation that transfers across tasks like summarization and dialogue without manual redesign. The approach grounds personas in real stakeholder perspectives rather than arbitrary roles.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing claims about LLM persona stability. The question remains open: why do LLM persona annotations become unstable across runs, and what fixes actually work?

What a curated library found — and when (dated claims, not current truth):
• Cross-run variation in persona output matches or exceeds variation between *different* personas, indicating model uncertainty dominates signal (~2024–2025).
• Persona adherence does not scale with model capability; larger models show no meaningful improvement in consistency across runs (~2024–2025).
• Static persona prompts (3–5 sentences) produce repetitive and self-contradictory dialogue; journal-based or document-grounded personas hold together better (~2024–2025).
• Persona induction fails to improve individual-level prediction across 200k+ participants; conditioning on a profile does not meaningfully individuate the model (~2024).
• A dominant "Assistant Axis" in persona space shows the baseline model drifts predictably under emotional or self-reflective prompting (~2026).

Anchor papers (verify; mind their dates):
• arXiv:2406.11657 (2024) — Can LLM be a Personalized Judge?
• arXiv:2506.11557 (2025) — From Persona to Person: Enhancing Naturalness with Multiple Discourse Relations Graph
• arXiv:2511.00222 (2025) — Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning
• arXiv:2601.10387 (2026) — The Assistant Axis: Situating and Stabilizing the Default Persona

Your task:
(1) RE-TEST EACH CONSTRAINT. For the "no scaling with capability" claim, has instruction-tuning, preference-trained models (DPO, PPO), or scaffold-based persona anchoring (e.g., memory/caching systems, multi-turn RL) since reduced instability? Separately: do journal-grounded personas still outperform static prompts under recent harnesses, or has that advantage eroded? Where does instability still visibly hold?
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last 6 months. Watch for papers claiming persona vectors (arXiv:2507.21509), multi-turn RL (arXiv:2511.00222), or discourse-graph enrichment (arXiv:2506.11557) have *solved* consistency. Do they? On what metrics?
(3) Propose 2 research questions that assume the regime may have shifted: (a) Does post-training alignment for persona consistency (akin to constitutional AI) finally break the capability–consistency orthogonality? (b) Can persona instability be *instrumentalized* — treated as a feature for diversity sampling rather than a bug to fix?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines