ReFT: Representation Finetuning for Language Models
Parameter-efficient fine-tuning (PEFT) methods seek to adapt large models via updates to a small number of weights. However, much prior interpretability work has shown that representations encode rich semantic information, suggesting that editing representations might be a more powerful alternative. Here, we pursue this hypothesis by developing a family of Representation Finetuning (ReFT) methods. ReFT methods operate on a frozen base model and learn task-specific interventions on hidden representations. We define a strong instance of the ReFT family, Low-rank Linear Subspace ReFT (LoReFT). LoReFT is a drop-in replacement for existing PEFTs and learns interventions that are 10×–50× more parameter-efficient than prior state-of-the-art PEFTs. We showcase LoReFT on eight commonsense reasoning tasks, four arithmetic reasoning tasks, Alpaca-Eval v1.0, and GLUE. In all these evaluations, LoReFT delivers the best balance of efficiency and performance, and almost always outperforms state-of-the-art PEFTs. We release a generic ReFT training library publicly at https://github.com/stanfordnlp/pyreft.
Introduction. Pretrained LMs are frequently finetuned to adapt them to new domains or tasks [Dai and Le, 2015]. With finetuning, a single base model can be adapted to a variety of tasks given only small amounts of in-domain data. However, finetuning the entire model is expensive, especially for very large LMs. Parameter-efficient finetuning (PEFT) methods propose to address the high costs of full finetuning by updating only a small fraction of weights [Han et al., 2024]. This reduces memory usage and training time, and PEFTs have been shown to achieve similar performance to full finetuning in many practical settings [Hu et al., 2023]. Adapters, which are a common family of PEFTs, learn an edit that can be added to a subset of model weights, or an additional set of weights that operate alongside the frozen base model. Recent adapters such as LoRA [Hu et al., 2022] (and variants such as DoRA; Liu et al., 2024b) reduce the number of trainable parameters in learned weight updates by using low-rank approximations in place of full weight matrices during adapter training.
Discussion / Conclusion. In this paper, we propose a strong alternative to PEFTs, LoReFT. LoReFT achieves strong performance across benchmarks from four domains while being 10×–50× more efficient than prior state-of-the-art PEFTs. Notably, LoReFT establishes new state-of-the-art performance on commonsense reasoning, instruction-following, and natural language understanding against the strongest PEFTs. We also show how our method can be described under a generic framework — ReFT. ReFT is a new approach to finetuning that is more powerful, more parameter-efficient, and more interpretable than any existing PEFTs. We hope our work serves as an initial call for the community to study ReFTs. We also hope to explore why ReFT works, and we provide some of our early explorations in our supplementary materials, focusing on memorisation (appendix E and appendix F) and compositional merging of ReFT weights (appendix G). ReFT, abstraction, and generation. Neural network interpretability research often struggles to contribute directly to improving models.