SYNTHESIS NOTE
Reasoning, Retrieval, and Evaluation Language, Text, and Discourse Model Architecture and Internals

Can we defend RAG systems from corpus poisoning without retraining?

Explores whether retrieval-time defenses can catch and block poisoned documents before they reach the generator, without expensive retraining cycles. Matters because corpus updates outpace model retraining in production RAG systems.

Synthesis note · 2026-05-03
Where do retrieval systems fail and why?

RAG poisoning attacks insert malicious documents into the retrieval corpus so they get pulled in for matching queries and steer generation toward attacker-preferred outputs. Existing defenses typically require retraining the retriever or the generator, which is expensive and slow to deploy. RAGPart and RAGMask propose two lightweight defenses that operate at retrieval time without modifying the generation model.

RAGPart exploits a structural property of dense retrievers: they learn discriminative patterns from how the training data is partitioned, which means malicious documents inserted into one partition have predictably limited influence on retrieval from queries that match a different partition. By configuring partitions deliberately, the system bounds how far any single poisoned document can propagate. RAGMask takes a different angle: it masks tokens in candidate documents and watches for abnormal similarity shifts. Genuine documents are robust to token masking — their similarity scores degrade smoothly — while poisoned documents that rely on specific trigger tokens show sudden similarity collapse, which serves as a detection signal.

The architectural significance is that defense need not be coupled to training. Both methods sit at the retrieval layer and treat the generator as an untrusted black box that must be protected from upstream corruption. This separation matters operationally because retrieval corpora update faster than retrievers can be retrained, so defenses that require retraining are always behind the threat. The threat surface is real and severe — How vulnerable is GraphRAG to tiny text manipulations? shows even minimal corpus modifications can devastate accuracy in graph-structured RAG.

Inquiring lines that use this note as a source 38

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
15 direct connections · 122 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

RAG corpus poisoning has lightweight defenses without retraining — partition-aware retrieval and token-masking similarity shifts catch attacks the generator never sees