LLM+P: Empowering Large Language Models with Optimal Planning Proficiency

Paper · arXiv 2304.11477 · Published April 22, 2023
Task PlanningDomain Specialization in LLMs

Abstract: Large language models (LLMs) have demonstrated remarkable zeroshot generalization abilities: state-of-the-art chatbots can provide plausible answers to many common questions that arise in daily life. However, so far, LLMs cannot reliably solve long-horizon planning problems. By contrast, classical planners, once a problem is given in a formatted way, can use efficient search algorithms to quickly identify correct, or even optimal, plans. In an effort to get the best of both worlds, this paper introduces LLM+P, the first framework that incorporates the strengths of classical planners into LLMs. LLM+P takes in a natural language description of a planning problem, then returns a correct (or optimal) plan for solving that problem in natural language. LLM+P does so by first converting the language description into a file written in the planning domain definition language (PDDL), then leveraging classical planners to quickly find a solution, and then translating the found solution back into natural language. Along with LLM+P, we define a diverse set of different benchmark problems taken from common planning scenarios.

Introduction. Ever since the birth of the field, AI researchers have sought to create programs that can converse in natural language with the same grace and flexibility as people. While even relatively simple models, such as Eliza from 1966 [1], can generate responses to some prompts that seem reasonable, it has always been relatively easy to generate prompts that expose their weaknesses compared to people — their lack of true “understanding.” While large language models (LLMs) such as GPT-4 [2] and ChatGPT [3] have far surpassed expectations of just a few years ago, they are no different in this respect. Indeed the internet is now awash with examples of people reveling in getting ChatGPT to generate output that even a 5-year-old human child would know to be ill-advised. Given how LLMs are designed and trained, this phenomenon should come as no surprise. They are not specifically built to demonstrate understanding. They are trained to generate sequences of words that might be plausible to a human given a prior context. In the terms of Mahowald et al.

Discussion / Conclusion. In this work, we propose to leverage classical planners to empower large language models with optimal planning capabilities. The key design choice of the proposed LLM+P framework is to focus LLMs on translating the planning problem from natural language to structured PDDL format. Moreover, we show that it is important to also make LLMs aware of a simple (problem, PDDL) pair as a demonstration (or the context) for in-context learning. Some interesting directions to further extend the LLM+P framework include: 1) enabling the LLM to auto-detect when and how to apply LLM+P; and 2) reducing LLM+P’s dependency on information by humans, potentially involving finetuning.