Experimental Design for Active Transductive Inference in Large Language Models

Paper · arXiv 2404.08846 · Published April 12, 2024
Prompts and Prompting

One emergent ability of large language models (LLMs) is that query-specific examples can be included in the prompt at inference time. In this work, we use active learning for adaptive prompt design and call it Active In-context Prompt Design (AIPD). We design the LLM prompt by adaptively choosing few-shot examples from a training set to optimize performance on a test set. The training examples are initially unlabeled and we obtain the label of the most informative ones, which maximally reduces uncertainty in the LLM prediction. We propose two algorithms, GO and SAL, which differ in how the few-shot examples are chosen. We analyze these algorithms in linear models: first GO and then use its equivalence with SAL. We experiment with many different tasks in small, medium-sized, and large language models; and show that GO and SAL outperform other methods for choosing few-shot examples in the LLM prompt at inference time.

Introduction. Large language models (LLMs), such as Vicuna [Chiang et al., 2023], Falcon-40B [Penedo et al., 2023], and OpenLLaMA [Touvron et al., 2023] are applied in mainly two ways: fine-tuning and prompt tuning. In fine-tuning, the LLM weights are adapted to a downstream task [Devlin et al., 2018]. Fine-tuning can easily incorporate domain knowledge that a pre-trained model may not possess and resembles classic inductive inference. Fine-tuned models often do not need carefully designed prompts, which makes them easier to deploy. The main drawback of fine-tuning is that it can be costly, because tens of thousands of training examples may be needed to fine-tune billions of parameters of the LLM [Ding et al., 2023]. In prompt tuning, the LLM weights are fixed and the LLM is given query-specific examples at inference time that affect its output [Lester et al., 2021]. This ability to conduct in-context inference is one of the emergent abilities of LLMs. Prompt tuning does not require large training sets.

Discussion / Conclusion. In this paper, we studied the framework of active in-context prompt design (AIPD) that uses optimal design to systematically choose the most informative unlabeled examples to label for a set of test examples. These informative labeled examples are then used to minimize the prediction error of the LLM for all the test examples. To our knowledge, this is the first paper that studies optimal design for adaptive prompt design. Inspired by the linear model, we proposed an algorithm GO that strategically chooses the most informative examples that minimize the variance of the posterior covariance for any test example from the test set. We proposed a second algorithm SAL that uses simulations to estimate the impact of how unlabeled examples reduce LLMuncertainty for all test examples. It then chooses to label examples that maximally reduce the uncertainty of the LLM for all