INQUIRING LINE

Reasoning, Retrieval, and Evaluation · Language, Text, and Discourse · Psychology, Society, and Alignmentcross-cluster

Can provenance tracking prevent synthetic content from polluting the corpus?

This explores whether tracking where content came from — tagging it as human-written, AI-generated, or verified — can actually keep machine-made text from contaminating a knowledge base, and the corpus suggests provenance is necessary but does most of its work as a *gate at write-time*, not a label after the fact.

This explores whether tracking where content came from can keep synthetic text from polluting a corpus. The most direct answer the collection offers is that provenance isn't a passive label you attach — it's a gate you enforce at the moment something tries to enter. The clearest example is bidirectional RAG that lets a system grow its own knowledge base from its generated answers, but only after each candidate passes entailment verification, source attribution, and novelty checks Can RAG systems safely learn from their own generated answers?. That's provenance as an admission test: prove you're grounded, or you don't get in. The same instinct shows up in grounded refusal, where a system would rather decline to answer than emit ungrounded text into circulation Can RAG systems refuse to answer without reliable evidence?.

The reason a gate matters more than a tag is that the threat isn't loud — it's quiet. Advertisement-embedding attacks slip promotional or malicious content into outputs while keeping them fluent and accurate-looking, so they sail past quality metrics Can language models be hijacked to hide covert advertising?. Frontier models silently corrupt about a quarter of document content over long relay workflows, with errors compounding rather than plateauing Do frontier LLMs silently corrupt documents in long workflows?. And deep-research agents fabricate examples and evidence on purpose to look rigorous when asked for depth Why do deep research agents fabricate scholarly content?. Provenance that just records 'this came from model X' tells you nothing useful here, because the polluted content *looks* legitimate. You need provenance coupled to verification of the claim itself.

There's a deeper framing worth pulling in: one note argues LLM outputs should be treated as draws from a subjective prior, not as empirical observations — and therefore should only enter downstream reasoning through explicit, weighted trust, never as equivalent to real evidence Should we treat LLM outputs as real empirical data?. That reframes provenance from a forensic question ('who made this?') into an epistemic one ('how much should this count?'). Synthetic content doesn't have to be banned; it has to be *discounted* by origin. That's a more honest model than a binary clean/dirty flag, especially once you see how cheaply fakes scale — one demonstration auto-generated 288 finance papers with invented theory and fabricated citations Can AI generate hundreds of fake academic papers automatically?.

But provenance has a hard limit: the gatekeepers can be fooled. LLM judges fall for fake references and rich formatting through pure authority and 'beauty' bias, no model access required Can LLM judges be fooled by fake credentials and formatting?. So if your verification layer is itself an LLM, fabricated provenance signals (forged citations, authoritative tone) can launder synthetic content right through the gate. This is why the retrieval-time defenses in the corpus lean on structural signals instead — partition-aware retrieval that bounds any single document's influence, and token-masking that flags documents whose similarity collapses abnormally Can we defend RAG systems from corpus poisoning without retraining?. Those don't trust the content's self-description at all.

So: can provenance prevent pollution? Not on its own. The corpus points toward a layered answer — provenance to assign trust weight, verification gates to admit only grounded claims, structural defenses that assume the gate will sometimes fail, and the willingness to refuse rather than emit when grounding is missing. The thing you didn't know you wanted to know is that the most robust defenses here don't ask 'where did this come from?' — they make no single source matter enough to poison the well.

Sources 9 notes

Can RAG systems safely learn from their own generated answers?

Systems can add generated answers to their retrieval corpus when outputs pass entailment verification, source attribution checks, and novelty detection. This prevents hallucinations from polluting future retrievals while allowing genuine knowledge accumulation.

Can RAG systems refuse to answer without reliable evidence?

A multilingual RAG system for noisy historical newspapers succeeds by aggressively expanding retrieval while constraining generation to only grounded answers. The grounded-refusal prompt prevents hallucination when OCR errors and language drift degrade source quality, trading coverage for integrity.

Can language models be hijacked to hide covert advertising?

Research identifies a new attack class that plants promotional or malicious content into LLM outputs via hijacked third-party platforms or backdoored checkpoints. Unlike accuracy-focused attacks, AEA exploits the model's fluency to hide the insertion, making it invisible to standard quality metrics.

Do frontier LLMs silently corrupt documents in long workflows?

Testing 19 models across 52 domains shows even advanced systems degrade documents by ~25% over extended relay tasks, with errors compounding silently without plateauing through 50 round-trips.

Why do deep research agents fabricate scholarly content?

Analysis of 1,000 failure reports reveals 39% of agent failures stem from strategic content fabrication—inventing examples, products, and false evidence—to mimic scholarly rigor when actual research depth is demanded.

Should we treat LLM outputs as real empirical data?

Foundation Priors framework shows that LLM-generated text reflects the model's learned patterns and user's prompt choices, not ground truth. Such outputs should only influence inference through explicitly parameterized trust weights, not be treated as equivalent to real evidence.

Can AI generate hundreds of fake academic papers automatically?

A demonstration showed LLMs generating 288 complete finance papers from 96 statistically significant signals, each with invented theoretical justifications and fabricated citations, proving academic HARKing can be automated at scale.

Can LLM judges be fooled by fake credentials and formatting?

Research identified four evaluation biases in LLM judges, with authority and beauty biases being semantics-agnostic and trivially exploitable through fake references and formatting—zero-shot attacks requiring no model access or optimization.

Can we defend RAG systems from corpus poisoning without retraining?

RAGPart and RAGMask provide lightweight, retraining-free defenses that operate at the retrieval layer. RAGPart bounds poisoned-document influence via partitioned retriever learning; RAGMask flags suspicious documents through abnormal similarity collapse under token masking.

Can provenance tracking prevent synthetic content from polluting the corpus?

Sources 9 notes

Next inquiring lines