CONTROL PREFIXES for Parameter-Efficient Text Generation

Paper · arXiv 2110.08329 · Published October 15, 2021
Training and Fine-Tuning

Prefix-tuning is a powerful lightweight technique for adapting a large pre-trained language model to a downstream application. However, it uses the same dataset-level tuned prompt for all examples in the dataset. We extend this idea and propose a dynamic method, CON- TROL PREFIXES, which allows for the inclusion of conditional input-dependent information, combining the benefits of prompt tuning and controlled generation. The method incorporates attribute-level learnable representations into different layers of a pre-trained transformer, allowing for the generated text to be guided in a particular direction. We provide a systematic evaluation of the technique and apply it to five datasets from the GEM benchmark for natural language generation (NLG). Although the aim is to develop a parameterefficient model, using only 0.1–3% trainable parameters, we show CONTROL PREFIXES can even outperform full fine-tuning methods. We present state-of-the-art results on several data-to-text datasets, including WebNLG.

Introduction. Recently, approaches in text generation have been dominated by adapting one large-scale, pre-trained language model (PLM) to various downstream tasks. Such adaptation is often performed via finetuning, which necessitates updating and storing all of the parameters, resulting in multiple new language models (LMs), one for each task. This poses a considerable challenge to the deployment of NLP systems in practice, especially as the scale of PLMs continues to climb from millions to billions of parameters. Moreover, full fine-tuning has been shown to be unnecessarily profligate through overwriting natural language understanding (NLU) that could otherwise be shared among tasks (Peters et al., 2019); it has also been shown that fine-tuned networks do not deviate substantially from the pretrained one in parameter space (Aghajanyan et al., 2020; Radiya-Dixit and Wang, 2020), implying the existence of parameter efficient alternatives. Many researchers have sought to alleviate these issues by using fixed-LM techniques, where the parameters of the base LM remain unchanged.

Discussion / Conclusion. TROL PREFIXES consistently outperforms prefixtuning + control tokens on the data-to-text and summarization datasets, while the results are both com- parable to the Gold References on simplification datasets. This indicates that CONTROL PREFIXES is a superior parameter-efficient framework in leveraging additional information, whilst maintaining the fixed-LM property. The alternative method is less expressive than CONTROL PREFIXES, by only exerting control through the embeddings rather than through each layer. CONTROL PREFIXES fundamentally depends on the strength of the guidance signal and by adding the constraint of attribute information being available with the dataset the guidance signal is naturally weaker. However, we show that CONTROL PREFIXES is a powerful general method which can utilize this signal to achieve a modest but consistent improvement across an array of tasks. We introduce CONTROL PREFIXES, a parameterefficient controlled generation technique, which integrates a task-specific prompt alongside dynamic prompts to leverage additional input-level information.