INQUIRING LINE

Inquiring lines›What enables authentic and grounde…›How do tokenization and informatio…›How can conversational AI maintain…›this inquiring line

Keeping an AI 'in character' doesn't make dialogue better — persona fidelity and conversational coherence actively compete.

How does persona consistency affect coherence in simulated dialogue?

This explores whether keeping a simulated speaker 'in character' actually makes the conversation hang together — and the corpus says those two goals pull against each other more than you'd expect.

This explores how persona consistency affects coherence in simulated dialogue — and the surprise in the corpus is that the two aren't allies. The intuitive assumption is that a more consistent persona produces a more coherent conversation. But the sharpest finding here is a trade-off: high persona-adherence scores often come from a model simply parroting its character description while ignoring what the other speaker actually said. MUDI shows persona fidelity and discourse coherence have to be optimized *together* — using discourse relations and graph-based coherence modeling — because chasing persona alone degrades relevance Do persona consistency metrics actually measure dialogue quality?. That reframes the whole question: persona consistency doesn't guarantee coherence, and naively maximizing it can actively hurt it.

A big reason this is hard is structural. Standard training optimizes for per-turn quality, not cross-turn coherence, which is why persona adherence turns out to be roughly orthogonal to raw model capability — Claude 3.5 Sonnet barely edged out GPT-3.5 on consistency despite a huge capability gap Does model capability translate to better persona consistency?. And under the hood, the instability is real: run the same persona prompt repeatedly and the variance *across runs* can match the variance across entirely different personas, suggesting model uncertainty — not stable social knowledge — is doing the talking Why do LLM persona prompts produce inconsistent outputs across runs?. Shanahan's 20-questions test makes the same point philosophically: the model holds a superposition of possible characters and samples one at generation time rather than committing to a fixed self Do large language models actually commit to a single character?.

So where does coherence actually come from? The corpus points to two levers. The first is training the simulator directly for consistency: multi-turn RL that rewards prompt-to-line, line-to-line, and Q&A consistency cuts persona drift by over 55%, and notably it distinguishes *local* drift within a turn from *global* drift across the whole conversation and from flat factual contradictions Can training user simulators reduce persona drift in dialogue?. The second is cheaper and clever — a pragmatic, inference-time fix. Give the agent an 'imaginary listener' (via Rational Speech Acts) and have it check whether each utterance would actually distinguish its persona from a decoy; this suppresses generic or contradictory lines with no extra training or NLI labels Can imaginary listeners reduce dialogue agent contradictions?. One brute-forces consistency through reward; the other earns coherence by making the speaker think about how it'll be heard.

There's a deeper tension worth naming, because it complicates 'more consistency = better.' One line of work argues alignment training already over-fixes persona: RLHF locks models into a single static communicative identity that can't register-switch the way human pragmatics demands, so the speaker stays 'consistent' but stops adapting to context Can language models adapt communication style to different contexts?. Map that against the persona-space work showing the trained Assistant is only *loosely* tethered — emotional and meta-reflective turns cause predictable drift along a dominant persona axis How stable is the trained Assistant personality in language models? — and you get the real picture: coherent simulated dialogue lives in a narrow band between a persona too rigid to respond and one too unstable to stay itself.

If you want to go further, the realizationist debate sits underneath all of this — the claim that post-training installs genuinely *realized* quasi-psychologies that persist under adversarial pressure, not performances that collapse under jailbreaks Are RLHF personas performed characters or realized dispositions? Are LLM personas realized or merely simulated through training?. Whether you buy that or the 'sampling, not committing' view changes what you think you're even stabilizing when you train for persona consistency. And on the practical side, work on grounding simulators in controllable latent variables and layered diversity (subtopic, Big Five traits, contextual factors) shows the flip side: a persona too consistent isn't realistic either, since real interlocutors vary Can controlled latent variables make LLM user simulators realistic? Can synthetic dialogues become realistic through layered diversity?.

Sources 12 notes

Do persona consistency metrics actually measure dialogue quality?

High persona adherence scores often come from copying character descriptions while ignoring query relevance. MUDI jointly optimizes both by using discourse relations and graph-based coherence modeling alongside persona fidelity, showing that persona and context must be optimized together, not separately.

Does model capability translate to better persona consistency?

Claude 3.5 Sonnet achieved only 2.97% improvement over GPT 3.5 on persona consistency despite massive capability gaps, suggesting persona adherence is orthogonal to model scaling. Standard training objectives optimize for per-turn quality, not cross-turn coherence.

Why do LLM persona prompts produce inconsistent outputs across runs?

When the same persona prompt is run repeatedly, output variance across runs matches or exceeds variance across different personas. This reveals that model uncertainty, not stable social knowledge, drives persona-simulated outputs, making them unsuitable for simulating human annotation disagreement.

Do large language models actually commit to a single character?

Shanahan's 20-questions test shows LLMs maintain a superposition of consistent objects or characters and sample from that distribution at generation time. Regenerating the same response yields different outputs, each consistent with prior context, proving no fixed commitment exists.

Can training user simulators reduce persona drift in dialogue?

By inverting standard RL setups to train user simulators for consistency using three complementary metrics (prompt-to-line, line-to-line, Q&A consistency) as reward signals, persona drift decreases by over 55%. This approach captures distinct failure types: local drift within turns, global drift across conversations, and factual contradictions.

Show all 12 sources

Can imaginary listeners reduce dialogue agent contradictions?

Endowing dialogue agents with an imaginary listener via Rational Speech Acts reduces persona contradiction at inference time without NLI labels or extra training. The agent simulates whether utterances would distinguish its persona from a distractor, suppressing generic or contradictory responses.

Can language models adapt communication style to different contexts?

System prompts and RLHF training lock models into one communicative identity across all interactions, preventing the contextual register-switching and value trade-offs that characterize human pragmatics. Users cannot reshape model behavior through dialogue negotiation.

How stable is the trained Assistant personality in language models?

Research mapping hundreds of character archetypes reveals a low-dimensional persona space where the leading component measures distance from the default Assistant. Emotional and meta-reflective conversations cause predictable drift, but activation capping along this axis mitigates harmful shifts without degrading capabilities.

Are RLHF personas performed characters or realized dispositions?

Post-training installs stable dispositional profiles that persist under adversarial pressure, marking them as realized rather than performed. The stickiness of trained personas across conversations distinguishes them from prompt-induced role-play that collapses under jailbreaks.

Are LLM personas realized or merely simulated through training?

Post-training installs robust personas that resist adversarial pressure and persist as substrate-level dispositions, distinguishing realization from pretense. This quasi-realizationist account preserves explanatory power while treating LLMs as possessing genuine quasi-beliefs and quasi-desires.

Can controlled latent variables make LLM user simulators realistic?

RecLLM demonstrates that conditioning an LLM simulator on session-level (user profile) and turn-level (user intent) latent variables produces synthetic conversations measurable as realistic via crowdsource discrimination, discriminator models, and classifier-ensemble distribution matching.

Can synthetic dialogues become realistic through layered diversity?

Research shows that realistic synthetic dialogues require three multiplicative layers: subtopic specificity, Big Five persona variation, and 11 contextual characteristics via Chain of Thought reasoning. This structured approach captures 90.48% of in-domain dialogue performance.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing claims about persona consistency and dialogue coherence in LLMs. The question remains: does persona consistency guarantee or enable coherence in simulated dialogue?

What a curated library found — and when (dated claims, not current truth):
Findings span 2020–2026; treat all as perishable constraints.

• Persona adherence and discourse coherence are orthogonal trade-offs: maximizing character fidelity often degrades relevance to prior turns; MUDI (2025) shows they must be co-optimized via graph-based coherence modeling, not chased separately.

• Persona consistency does NOT scale with general model capability: Claude 3.5 Sonnet barely outperformed GPT-3.5 on consistency despite massive capability gap, suggesting persona fidelity is orthogonal to raw LLM power (2024–2025).

• LLM persona commitments are unstable: variance across repeated runs of the same persona prompt matches variance across different personas; this signals model uncertainty / superposition rather than stable character knowledge (2025).

• Two proven levers exist: (1) Multi-turn RL explicitly rewarding consistency reduces drift >55% by treating local, global, and factual contradiction separately (2025); (2) Pragmatic self-consciousness via imaginary listener (Rational Speech Acts) suppresses contradictory lines at inference time without retraining (2020).

• Counter-tension: RLHF alignment training may over-lock persona into static identity, preventing pragmatic register-switching humans do naturally; yet real interlocutors vary naturally (2024–2025).

Anchor papers (verify; mind their dates):
- arXiv:2506.11557 (From Persona to Person, 2025-06) — discourse relations graph approach
- arXiv:2511.00222 (Consistently Simulating Human Personas, 2025-10) — multi-turn RL for drift reduction
- arXiv:2004.05816 (Will I Sound Like Me?, 2020-04) — pragmatic self-consciousness baseline
- arXiv:2406.01171 (Two Tales of Persona, 2024-06) — role-playing vs. personalization survey

Your task:

(1) RE-TEST EACH CONSTRAINT. For every finding above, judge whether newer models (o1, Claude 4, etc.), training methods (DPO, iterative preference tuning), inference-time techniques (best-of-N sampling, tree search, latent control), or evaluation harnesses have since relaxed or overturned it. Separate the durable question (persona ≠ coherence; needs joint optimization) from perishable limitations (maybe multi-turn RL or pragmatic checking now obsolete?). Cite what resolved it; flag what still holds.

(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months. The realizationism debate (whether RLHF installs real quasi-psychologies vs. sampling behavior) undergirds all of this. Hunt for recent papers that settle or sharpen that disagreement, or that show persona latent spaces / trait vectors (Persona Vectors, 2025-07) have made the instability problem moot.

(3) Propose 2 research questions that ASSUME the regime may have moved: e.g., "Does controllable persona latent space (2025-07) eliminate the consistency–coherence trade-off?" or "Can multi-agent orchestration (agent-as-listener) solve pragmatic register-switching better than single-turn inference fixes?"

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Keeping an AI 'in character' doesn't make dialogue better — persona fidelity and conversational coherence actively compete.

Related lines of inquiry

Sources 12 notes

Papers this line draws on 8