Does repeated sensitive data in fine-tuning cause memorization?
When language models train on the same private or proprietary data multiple times, how much do they end up memorizing and leaking that information at inference time? Understanding this risk is critical for organizations fine-tuning on confidential datasets.
Memorization is most dangerous exactly where organizations fine-tune on proprietary or personal data. Controlled experiments across GPT-2, Phi-3, and Gemma-2 quantify the risk: fine-tuning with repeated sensitive data raises privacy-leakage rates from a 0-5% baseline to 60-75% — a 64.2% average increase — because repeated exposure pushes the model toward near-verbatim reproduction at inference. This is the concrete mechanism behind the theory that in-weight learning overwrites and memorizes.
The constructive half rebuts the assumed privacy-utility tradeoff. A layered framework — semantic data deduplication, differential privacy during generation, entropy-based filtering, and pattern-based content filtering — drives leakage to 0% while retaining 94.7% of original utility. The keeper is that privacy and performance are not inherently incompatible in fine-tuned LLMs: the defenses are complementary and operate at different stages (data, generation, output), so stacking them closes the gap without gutting capability.
This is the privacy face of the in-weight-learning cost documented elsewhere. It supplies the mechanism behind Can models store unlimited facts without growing larger? (finetuning facts in is exactly what memorizes), and it complements When do language models stop memorizing and start generalizing?: that note bounds capacity in theory; this one shows repetition saturating it into leakage in fine-tuning practice.
Inquiring lines that use this note as a source 4
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
Related concepts in this collection 3
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Can models store unlimited facts without growing larger?
Does external tool use let language models recall facts without being constrained by parameter count? This matters because it could reshape how we scale knowledge capacity beyond architectural limits.
finetuning facts into weights is the memorization mechanism this note measures and mitigates
-
When do language models stop memorizing and start generalizing?
Can we measure the exact capacity limit where models transition from memorizing training data to learning underlying patterns? Understanding this boundary could reshape how we think about model learning and privacy.
theoretical capacity bound; this is the fine-tuning-practice leakage that fills it
-
Do reasoning traces actually expose private user data?
Explores whether language models leak sensitive information through their internal reasoning steps, even when explicitly instructed not to. Investigates the mechanisms and scale of privacy exposure in reasoning traces.
a different leakage channel (recollection in traces) for the same privacy concern
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Assessing and Mitigating Data Memorization Risks in Fine-Tuned Large Language Models
- On the Impact of Fine-Tuning on Chain-of-Thought Reasoning
- Leaky Thoughts: Large Reasoning Models Are Not Private Thinkers
- How much do language models memorize?
- Spurious Forgetting in Continual Learning of Language Models
- How new data permeates LLM knowledge and how to dilute it
- Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
- The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities
Original note title
fine-tuning on repeated sensitive data drives memorization from five percent to sixty to seventy-five percent but layered mitigations reach zero leakage at ninety-five percent utility