Interacting with Non-Cooperative User: A New Paradigm for Proactive Dialogue Policy

Paper · arXiv 2204.07433 · Published April 7, 2022
Conversation Architecture and Structure

Proactive dialogue system is able to lead the conversation to a goal topic and has advantaged potential in bargain, persuasion and negotiation. Current corpus-based learning manner limits its practical application in real-world scenarios. To this end, we contribute to advance the study of the proactive dialogue policy to a more natural and challenging setting, i.e., interacting dynamically with users. Further, we call attention to the non-cooperative user behavior — the user talks about off-path topics when he/she is not satisfied with the previous topics introduced by the agent. We argue that the targets of reaching the goal topic quickly and maintaining a high user satisfaction are not always converge, because the topics close to the goal and the topics user preferred may not be the same. Towards this issue, we propose a new solution named I-Pro that can learn Proactive policy in the Interactive setting. Specifically, we learn the trade-off via a learned goal weight, which consists of four factors (dialogue turn, goal completion difficulty, user satisfaction estimation, and cooperative degree). The experimental results demonstrate I-Pro significantly outperforms baselines in terms of effectiveness and interpretability.

Introduction. Proactive dialogue agent aims to lead the conversation with a user from the start topic (“Andy Lau”) to the goal topic (“Raging Fire”) through chatting with the user [27], as shown in Figure 1. This task has great potential in scenarios like bargain [8], persuasion [5, 25] and negotiation [12, 30]. Current solutions [2, 26, 27, 31, 35] follow the corpus-based learning setting — given a knowledge graph (KG), a goal topic and a dialogue context between two human (e.g., the leader and the follower in the DuConv corpus [27]), the agent is required to predict a topic of the next turn and generate a response based on this topic. However, turn-level policy might not align to the conversation-level policy well [6, 32]. Thus, we argue that the corpus-based learning setting is insufficient to meet the ultimate end that the agent is capable to chat with the user dynamically. In this work, we take one step further to scrutinize proactive dialogue policy in the interactive setting.

Discussion / Conclusion. In this work, we study the proactive dialogue policy in an interactive manner and call attention to the non-cooperative user behavior during the conversation. We argue that the interactive proactive dialogue policy learning has two targets: leading the conversation to the goal quickly and maintaining a high user satisfaction. To advance the two targets, we propose I-Pro which employs a learned goal weight to achieve a trade-off between them. We design user simulators to interact with the agents during training and evaluation. The experimental results demonstrate that I-Pro opens the performance gap for interactive proactive dialogue policy learning. Our work takes the first step to advance the interactive proactive dialogue policy learning, and can serve as a preliminary baseline to benefit further research. Naturally, there are thus a few loose ends for further investigation, especially with respect to more diverse user behavior and richer user personalities.