Enhancing Large Language Model Induced Task-Oriented Dialogue Systems Through Look-Forward Motivated Goals
Recently, the development of large language models (LLMs) has been significantly enhanced the question answering and dialogue generation, and makes them become increasingly popular in current practical scenarios. While unlike the general dialogue system which emphasizes the semantic performance, the task-oriented dialogue (ToD) systems aim to achieve the dialogue goal efficiently and successfully in multiple turns. Unfortunately, existing LLM-induced ToD systems lack the direct reward toward the final goal and do not take account of the dialogue proactivity that can strengthen the dialogue efficiency. To fill these gaps, we introduce the ProToD (Proactively Goal-Driven LLM-Induced ToD) approach, which anticipates the future dialogue actions and incorporates the goal-oriented reward signal to enhance ToD systems. Additionally, we present a novel evaluation method that assesses ToD systems based on goal-driven dialogue simulations. This method allows us to gauge user satisfaction, system efficiency and successful rate while overcoming the limitations of current Information and Success metrics.
Introduction. A task-oriented dialogue system is designed to assist users in achieving specific objectives. Its primary focus is on comprehending user needs and generating appropriate responses. The success rate is a pivotal metric in evaluating the effectiveness of a ToD system. A higher success rate indicates that the system is adept at meeting user requirements. Additionally, efficiency is gauged by the number of turns in a conversation. Fewer turns signify greater efficiency, underscoring the need for the system to be proactive. Drawing from the concept of proactivity in organizational behaviors (Grant and Ashford, 2008) and standard dictionary definitions (English, 1976), the proactivity of conversational agents can be characterized as their ability to steer or control a dialogue. This is achieved by taking the initiative and foreseeing potential impacts on themselves or users. In essence, the ultimate success of a ToD system mainly lies in proactive nature and capacity to effectively and efficiently address user needs.
Discussion / Conclusion. In this study, we present the ProToD model, an enhancement of the LLM-induced ToD system that incorporates future dialogue action anticipation and goal-oriented reward motivation. By utilizing future actions as cues to guide LLMs, our model offers more comprehensive responses and enhances the efficiency of dialogues. The integration of goaloriented rewards further fine-tunes the cues for LLMs, resulting in improved dialogue task completion rates through a reinforcement learning framework. Additionally, we introduce a goal-driven user simulation assessment based on GPT-4, providing a novel perspective to better evaluate dialogue efficiency and user satisfaction levels. Our validation process assesses the effectiveness of Pro- ToD by examining performance enhancements in Inform and Success metrics using the MultiWoZ 2.1 dataset. Furthermore, we present case studies and user simulation assessments that illustrate the improvements in dialogue efficiency and user satisfaction achieved by our model.