Do newer LLM generations create worse detector bias through increased linguistic divergence?
This reads the question as asking whether each new model generation drifts further from human language in systematic ways, and whether that drift is the thing that throws off detectors and classifiers built on top of these models — so I'll be upfront that the corpus doesn't test detectors directly, but it does map the linguistic divergence that any such bias would ride on.
This explores whether newer LLM generations diverge from human language systematically enough to skew downstream detection — and the honest starting point is that this collection has no paper measuring detector accuracy across model generations. What it does have is a sharp account of *how* and *why* LLM output drifts from the human distribution, which is the raw material any detector bias would be built from. So the answer is lateral: the divergence is real and predictable, but the corpus stops short of the detector claim itself.
The most direct mechanism is frequency bias. Does word frequency correlate with semantic abstraction? shows that because general words (hypernyms) appear more often than specific ones, and LLMs lean toward common paraphrases, their output systematically drifts toward abstraction and erases expert-level specificity. That's a fingerprint — a measurable way model text differs from how a specialist would actually write. A second fingerprint comes from Why do large language models fail at complex linguistic tasks?: even top models misread embedded clauses and complex nominals, and the errors *worsen predictably* with syntactic depth. Both findings point the same way — divergence isn't random noise, it's structured, which is exactly the kind of signal a classifier latches onto.
Where does that structure come from, and would it deepen with each generation? Where do cognitive biases in language models come from? is the load-bearing piece here: biases are planted during pretraining and only nudged by finetuning. That reframes your question — generational changes in divergence would track changes in pretraining corpora, not surface tuning. Why do language models struggle with historical legal cases? makes the corpus dependence concrete: models do worse on older legal cases simply because recent ones are over-represented in training. So the language a model produces is shaped by *what the corpus over-samples* — and as each generation ingests a different (and increasingly AI-contaminated) slice of text, the divergence profile shifts rather than simply growing.
The deeper twist your question doesn't anticipate: divergence is only half the story, because LLMs also *converge* on human patterns in ways that would defeat a naive detector. Do language models show the same content effects humans do? and Do large language models make the same causal reasoning mistakes as humans? show models reproducing human reasoning errors item-by-item — and Do humans and LLMs differ fundamentally or just superficially? argues that inside shared discourse, humans and LLMs draw on the same symbolic substrate. A detector keying on "this looks too machine-like" is fighting a moving target that is simultaneously diverging (frequency, blind spots) and converging (human-like bias signatures).
If you want the cleanest predictive frame, Can we predict where language models will fail? is the doorway: treating LLMs as autoregressive probability machines lets you predict *in advance* which outputs will be low-probability and therefore distinctive. That's the principled version of detector design — and the reason the real answer to your question is less "newer models diverge more" and more "divergence is corpus-shaped and structured, so detector bias shifts with each pretraining set rather than monotonically worsening."
Sources 8 notes
WordNet analysis shows hypernyms (general concepts) occur more frequently than hyponyms (specific ones). Combined with LLMs' frequency bias, this means preferring common paraphrases systematically drifts toward abstraction, erasing expert-level specificity.
Top-tier LLMs like Llama3-70b consistently misidentify embedded clauses, verb phrases, and complex nominals. Performance degrades predictably as syntactic depth increases, revealing that statistical learning captures surface patterns but not deep grammatical rules.
A causal experiment using random-seed variation and cross-tuning showed that models sharing a pretrained backbone exhibit similar bias patterns regardless of finetuning data. Biases are planted during pretraining and merely swayed by instruction tuning.
Supreme Court overruling benchmark (236 pairs) reveals era sensitivity: models perform worse on historical cases than modern ones. Root cause is training corpus over-representation of recent cases, creating shallower representations of older precedent.
LLMs show identical content-sensitivity patterns to humans on NLI, syllogisms, and Wason tasks, with belief-bias signatures matching human error rates item-by-item. This behavioral isomorphism across three independent tasks suggests content and logical form are inseparable in transformer reasoning architecturally.
LLMs show weak explaining away and Markov violations in collider networks, matching human error patterns exactly. This suggests shared mechanisms rooted in training data statistics rather than categorical reasoning inferiority.
Applied Habermas's observer/participant distinction to AI: from outside, humans and LLMs are utterly different; from within shared discourse, both draw on the same symbolic substrate, making the difference structural rather than absolute.
By framing LLMs as autoregressive probability machines, researchers predicted tasks with low-probability target responses would be systematically harder, even when logically simple. Experiments confirmed predictions like backwards alphabet and letter counting.