Conversational Graph Grounded Policy Learning for Open-Domain Conversation Generation
To address the challenge of policy learning in open-domain multi-turn conversation, we propose to represent prior information about dialog transitions as a graph and learn a graph grounded dialog policy, aimed at fostering a more coherent and controllable dialog. To this end, we first construct a conversational graph (CG) from dialog corpora, in which there are vertices to represent “what to say” and “how to say”, and edges to represent natural transition between a message (the last utterance in a dialog context) and its response. We then present a novel CG grounded policy learning framework that conducts dialog flow planning by graph traversal, which learns to identify a what-vertex and a how-vertex from the CG at each turn to guide response generation. In this way, we effectively leverage the CG to facilitate policy learning as follows: (1) it enables more effective long-term reward design, (2) it provides high-quality candidate actions, and (3) it gives us more control over the policy. Results on two benchmark corpora demonstrate the effectiveness of this framework.
Introduction. How to effectively learn dialog strategies is an enduring challenge for open-domain multi-turn conversation generation. To address this challenge, previous works investigate word-level policy models that simultaneously learn dialog policy and language generation from dialog corpora (Li et al., 2016b; Zhang et al., 2018b). But these word-level policy models often lead to a degeneration issue where the utterances become ungrammatical or repetitive (Lewis et al., 2017). To alleviate this issue, utterance-level policy models have been proposed to decouple policy learning from response generation, and they focus on how to incorporate high-level utterance representations, e.g., latent variables or keywords, to facilitate policy learning (He et al., 2018; Yao et al., 2018; Zhao et al., 2019). However, these utterance-level methods tend to produce less coherent multi-turn dialogs since it is quite challenging to learn semantic transitions in a dialog flow merely from dialog data without the help of prior information.
Discussion / Conclusion. In this paper we present a novel graph grounded policy learning framework for open-domain multiturn conversation, which can effectively leverage prior information about dialog transitions to foster a more coherent and controllable dialog. Experimental results demonstrate the effectiveness of this framework in terms of local appropriateness, global coherence and dialog-target success rate. In the future, we will investigate how to extend the CG to support hierarchical topic management in conversational systems.