Extracting memorized pieces of (copyrighted) books from open-weight language models
Plaintiffs and defendants in copyright lawsuits over generative AI often make sweeping, opposing claims about the extent to which large language models (LLMs) have memorized plaintiffs’ protected expression. Drawing on adversarial ML and copyright law, we show that these polarized positions dramatically oversimplify the relationship between memorization and copyright. To do so, we leverage a recent probabilistic extraction technique to extract pieces of the Books3 dataset from 13 open-weight LLMs. Through numerous experiments, we show that it’s possible to extract substantial parts of at least some books from different LLMs. This is evidence that the LLMs have memorized the extracted text; this memorized content is copied inside the model parameters. But the results are complicated: the extent of memorization varies both by model and by book. With our specific experiments, we find that the largest LLMs don’t memorize most books—either in whole or in part. However, we also find that LLAMA 3.1 70B memorizes some books, like Harry Potter and 1984, almost entirely. We discuss why our results have significant implications for copyright cases, though not ones that unambiguously favor either side.
Introduction. In the dozens of pending copyright suits over training LLMs, the opposing parties have tended to present the technical operation of models in simplified terms. Plaintiffs say LLMs are just giant (infringing) copy machines that store their works and recombine them in their outputs [67]. Defendants say LLMs merely contain linguistic relationships—“statistical correlations” [26]—and don’t copy the plaintiffs’ works. The situation is more complicated than either side suggests. Appreciating why requires a deeper understanding of training-data extraction, training-data memorization, and the relationship between the two (Section 2). While extraction refers to recovering specific training data from a model’s generated outputs, memorization is broader: it involves reconstructing specific training data by examining the model “through any means” [28, Glossary].
Discussion / Conclusion. Our results complicate the traditional narrative both plaintiffs and defendants typically use in copyright cases in describing how LLMs work. The evidence supports the positions of plaintiffs in some respects and of defendants in other respects. More generally, we show that the extent of memorization in models varies with model size, the specific choice of model, the book tested, and even within individual books (Section 3 & 4). We see three primary implications of our results for copyright disputes. Building on the recent work of Hayes et al. [54]—a novel probabilistic extraction method—we show that the extent of verbatim memorization of books from the Books3 dataset is more significant than previously described. We also show that memorization varies widely from model to model and from book to book within each model, as well as varying in different parts of individual books. Our results complicate current disputes over copyright infringement, both by rejecting easy claims made by both sides about how models work and by demonstrating that there is no single answer to the question of how much a model memorizes.