Improving Generalization in Task-oriented Dialogues with Workflows and Action Plans

Paper · arXiv 2306.01729 · Published June 2, 2023
Task PlanningAction ModelsTool Use and Computer-Use Agents

A screenshot of a computer program

Task-oriented dialogue is difficult in part because it involves understanding user intent, collecting information from the user, executing API calls, and generating helpful and fluent responses. However, for complex tasks one must also correctly do all of these things over multiple steps, and in a specific order. While large pre-trained language models can be fine-tuned end-to-end to create multi-step task oriented dialogue agents that generate fluent text, our experiments confirm that this approach alone cannot reliably perform new multi-step tasks that are unseen during training. To address these limitations, we augment the dialogue contexts given to text2text transformers with known valid workflow names and action plans. Action plans consist of sequences of actions required to accomplish a task, and are encoded as simple sequences of keywords (e.g. verify-identity, pull-up-account, reset-password, etc.). We perform extensive experiments on the Action- Based Conversations Dataset (ABCD) with T5- small, base and large models, and show that such models: a) are able to more readily generalize to unseen workflows by following the provided plan, and b) are able to generalize to executing unseen actions if they are provided in the plan.

Introduction. Dialogue allows service agents and users to accomplish complex tasks flexibly and naturally. However, such dialogues are challenging for automated agents because success requires the tracking of long-range information and correct behaviour at every step of multi-step tasks. In service-focused task-oriented dialogue, an agent and a user interact back-and-forth with natural language text to reach a goal determined by the user. The agent must identify the task the user intends to solve, collect relevant information from the user, and execute actions until the task is complete. The possible set of actions and the order in which they are accomplished depends on the specific task and environment. Recent work has applied modern large language models (LLMs), e.g. Raffel et al. (2020); Brown et al. (2020), etc., to complex structured reasoning tasks including task-oriented dialogue (Hosseini- Asl et al., 2020; Peng et al., 2021; He et al., 2022; Peng et al., 2021; Ham et al., 2020; Gao et al., 2020).

Discussion / Conclusion. By training T5 text2text models for task-oriented dialogue and augmenting the dialogue context with plan information, we show that large language models can and do make use of provided sequences of action plans and are able to generalize to new action sequences, flows, and actions beyond those seen in training. An advantage of our framework is that we could obtain workflow prompts from a symbolic planning mechanism or other types of external API calls. Symbolic planner correctness guarantees enable a maintainable system, allowing new workflows to be added, and intermediate steps to be changed and re-assembled, that could in turn be used for generate novel prompts. Appendix B shows how a set of workflows can be formulated as symbolic STRIPS planning (Fikes and Nilsson, 1971), and how adding an additional slot to an action automatically adjusts existing workflows, showing how symbolic planning is a promising direction for intentional modification of workflows.