Conversational Graph Grounded Policy Learning for Open-Domain Conversation Generation

Paper · Source

To address the challenge of policy learning in open-domain multi-turn conversation, we propose to represent prior information about dialog transitions as a graph and learn a graph grounded dialog policy, aimed at fostering a more coherent and controllable dialog. To this end, we first construct a conversational graph (CG) from dialog corpora, in which there are vertices to represent “what to say” and “how to say”, and edges to represent natural transition between a message (the last utterance in a dialog context) and its response. We then present a novel CG grounded policy learning framework that conducts dialog flow planning by graph traversal, which learns to identify a what-vertex and a how-vertex from the CG at each turn to guide response generation. In this way, we effectively leverage the CG to facilitate policy learning as follows: (1) it enables more effective long-term reward design, (2) it provides high-quality candidate actions, and (3) it gives us more control over the policy. Results on two benchmark corpora demonstrate the effectiveness of this framework.

Introduction. How to effectively learn dialog strategies is an enduring challenge for open-domain multi-turn conversation generation. To address this challenge, previous works investigate word-level policy models that simultaneously learn dialog policy and language generation from dialog corpora (Li et al., 2016b; Zhang et al., 2018b). But these word-level policy models often lead to a degeneration issue where the utterances become ungrammatical or repetitive (Lewis et al., 2017). To alleviate this issue, utterance-level policy models have been proposed to decouple policy learning from response generation, and they focus on how to incorporate high-level utterance representations, e.g., latent variables or keywords, to facilitate policy learning (He et al., 2018; Yao et al., 2018; Zhao et al., 2019). However, these utterance-level methods tend to produce less coherent multi-turn dialogs since it is quite challenging to learn semantic transitions in a dialog flow merely from dialog data without the help of prior information.

Discussion / Conclusion. In this paper we present a novel graph grounded policy learning framework for open-domain multiturn conversation, which can effectively leverage prior information about dialog transitions to foster a more coherent and controllable dialog. Experimental results demonstrate the effectiveness of this framework in terms of local appropriateness, global coherence and dialog-target success rate. In the future, we will investigate how to extend the CG to support hierarchical topic management in conversational systems.

Lines of inquiry this paper opens 24

Research framings built by reading the notes related to this paper — the questions it feeds into.

Why do agents confidently report success despite actually failing tasks?

Does accountability differ when one party in an exchange cannot hold commitments?

How should conversational agents balance goal-driven initiative with user control?

What dialogue dynamics distinguish negotiation from standard information-provision tasks?

How should dialogue recommender systems manage conversation history and state?

How should dialogue state tracking change when user preferences shift mid-conversation?

Why do language models reinforce false assumptions instead of correcting them?

How should dialogue systems represent uncertainty from noisy speech input?

How can language models sustain linguistic synchrony and intersubjectivity during dialogue?

Can AI ever lead conversations without the anticipatory presence sustained attention provides?

How does AI-generated content transformation affect public discourse quality?

How does AI lose correct information under conversational persuasive pressure?

Why do multi-turn conversations degrade AI intent and coherence?

How do training priors constrain what context information can override?

Can next-token prediction alone produce genuine language understanding?

How does the silent token approach compare to modeling intrinsic motivation for speaking?

Why can't humans reliably detect AI-generated text despite measurable linguistic signatures?

Can AI detect sense-of-nonsense the way human readers do?

How should models express uncertainty rather than forced confident answers?

Does uncertainty quantification in model responses reduce persuasive impact on audiences?

Can model confidence signals reliably improve reasoning quality and calibration?

Do verbal uncertainty estimates calibrate better than confidence scores for personalization?

How can persona representations reduce language model variance and improve task accuracy?

Why does model uncertainty dominate persona-specific knowledge in annotation tasks?

How do we evaluate AI systems when user perception misleads actual performance?

Can systems recognize and abstain on judgments rather than hallucinating preferences?

What properties determine whether reward signals teach genuine reasoning?

Why does combining natural language with numerical scores improve prediction accuracy?

How can models identify insufficient information and respond appropriately without guessing?

How do models signal knowledge gaps through token probability?

Why does self-revision increase model confidence while degrading accuracy?

Can single models correct their own beliefs without amplifying confidence in wrong answers?

Conversational Graph Grounded Policy Learning for Open-Domain Conversation Generation

Synthesis notes from this paper's topics 8

Lines of inquiry this paper opens 24