Do persona consistency metrics actually measure dialogue quality?
Personalized dialogue systems can achieve high persona consistency scores by simply restating character descriptions, ignoring conversational relevance. Does optimizing for persona fidelity necessarily harm the coherence readers actually care about?
Personalized dialogue generation faces a persistent dual optimization problem: persona consistency and discourse coherence pull in different directions, and most methods sacrifice one for the other.
The measurement trap is revealing. Methods that achieve the highest personalization scores (e.g., PAA) do so by "frequently generating sentences that are exact restatements of the persona description, often ignoring the relevance to the query." High persona adherence metrics can be achieved trivially through description copying — which looks like success on the persona dimension while failing on the coherence dimension. This is not a training failure but a measurement artifact that rewards surface-level persona adherence.
The coherence side has two distinct components:
Local coherence — logical connections between adjacent sentences, ensuring they relate to each other and form a coherent sequence. This is sentence-to-sentence reasoning.
Global coherence — higher-level relationships across the entire dialogue, maintaining topic consistency and effectively conveying meaning throughout an interaction. Poor global coherence impairs understanding of the discourse as a cohesive whole.
MUDI addresses the trade-off by incorporating discourse relations directly into the generation architecture. Using 16 discourse relation types from the STAC annotation scheme plus a topic-shift relation, an LLM (LLaMA-3-70B) annotates coherence relations between utterance pairs. A graph encoder (DialogueGAT) captures these interactive relationships, with Sentence-BERT initializing node features for sentence-level semantics. The key architectural additions are order information and turn information integrated via attention mechanisms.
The broader principle: persona fidelity and contextual coherence must be jointly optimized, not separately measured. Since Why does supervised learning fail to enforce persona consistency?, the training method (RL for consistency) and the generation architecture (discourse-aware for coherence) address different dimensions of the same problem. Neither alone is sufficient.
This connects to the three-failure-modes analysis. Since Why do static persona descriptions produce repetitive dialogue?, the persona-restatement failure identified by MUDI is a fourth failure mode: not just repetitiveness, shallowness, and contradiction, but contextual irrelevance — generating persona-consistent but conversationally inappropriate responses.
Inquiring lines that use this note as a source 19
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- How does persona consistency affect coherence in simulated dialogue?
- What does the 20-questions test reveal about LLM character consistency?
- Do synthetic personas maintain consistency across multiple conversations?
- What are the three distinct types of persona drift in dialogue systems?
- What training objectives would actually improve persona consistency at scale?
- Can offline RL scale persona consistency across multi-turn conversations?
- How can training methods enforce persona consistency without supervised learning penalizing it?
- Can persona consistency coexist with relevant dialogue in personalized conversation?
- How does distractor persona selection affect consistency enforcement in dialogue?
- Why is persona consistency a pragmatic property rather than semantic?
- What makes extended personal narratives more effective than attribute lists for personas?
- How does tree-structured persona maintenance prevent character drift in long conversations?
- Does linguistic style or content richness matter more for persona authenticity?
- Why does static persona definition fail to capture natural variation?
- Does persona assignment alone produce repetitive dialogue without situational grounding?
- How much does interview richness matter compared to model capability for persona accuracy?
- Can persona prompts reliably transfer across different question domains?
- How should persona prompts be used if not for accuracy?
- How do persona consistency and contextual relevance trade off in personalized dialogue systems?
Related concepts in this collection 4
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Why does supervised learning fail to enforce persona consistency?
Supervised learning trains models to generate good responses but never punishes contradictions. This note explores why explicit negative feedback is structurally necessary for dialogue agents to maintain consistent personas, and what training methods can provide it.
training mechanism for consistency; MUDI adds architectural mechanism for coherence
-
Why do static persona descriptions produce repetitive dialogue?
Does relying on fixed attribute lists to define conversational personas limit dialogue depth and consistency? Research suggests static descriptions may cause repetition and self-contradiction in generated responses.
persona-restatement is a fourth failure mode alongside repetitiveness, shallowness, contradiction
-
How do readers track segments, purposes, and salience together?
Can discourse processing actually happen in parallel rather than sequentially? This matters because understanding how readers coordinate multiple layers of meaning at once reveals where AI systems break down in comprehension.
Grosz & Sidner's framework aligns with MUDI's local/global coherence distinction
-
Why does ChatGPT fail at implicit discourse relations?
ChatGPT excels when discourse connectives are present but drops to 24% accuracy without them. What does this gap reveal about how LLMs actually process meaning and logical relationships?
MUDI's explicit discourse relation annotation may compensate for LLMs' implicit relation weakness
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- From Persona to Person: Enhancing the Naturalness with Multiple Discourse Relations Graph Learning in Personalized Dialogue Generation
- Will I Sound Like Me? Improving Persona Consistency in Dialogues through Pragmatic Self-Consciousness
- Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning
- Building Persona Consistent Dialogue Agents with Offline Reinforcement Learning
- Chamain: Harmonizing Character Persona Integrity with Domain-Adaptive Knowledge in Dialogue Generation
- Personalized Dialogue Generation with Persona-Adaptive Attention
- Hello Again! LLM-powered Personalized Agent for Long-term Dialogue
- PersonaPKT: Building Personalized Dialogue Agents via Parameter-efficient Knowledge Transfer
Original note title
persona consistency trades off against discourse coherence in personalized dialogue — models that prioritize persona restate descriptions at the expense of contextual relevance