Why do newer AI models diverge further from human writing patterns?
As language models improve, they seem to generate text that is measurably less human-like in lexical patterns, yet humans struggle to detect this difference. What drives this divergence, and what does it reveal about how models optimize for quality?
The lexical diversity study compared ChatGPT-3.5, 4, o4-mini, and 4.5. The key finding: the newer models — o4-mini and 4.5 — differ most from human-written text on lexical diversity measures. They are the least human-like by measurable metric.
At the same time, human judges consistently fail to detect AI-generated text regardless of model version. More capable models don't become easier to detect; the failure of human judgment is stable across model generations.
ChatGPT-4.5 produces higher lexical diversity than older models despite generating fewer tokens — it is more lexically dense, but the density pattern is still non-human. The implication: newer models aren't converging on human-like writing by becoming better at mimicking human lexical patterns; they are becoming better at generating high-quality text that is nonetheless systematically different from human text.
This suggests that the training objective (RLHF, quality preference) is pushing models toward a different optimum than "human-like lexical diversity." The optimum models converge on is rated higher quality by human raters but is more measurably distinct from how humans naturally write.
The widening gap between measurable and perceptible has an important practical consequence: as models improve, naive human-based detection becomes less viable, not more. Detection requires moving to statistical/computational analysis that humans don't spontaneously perform.
Inquiring lines that use this note as a source 17
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Why do different AI models generate similar outputs independently?
- What makes AI posts less likely to invite replies than human-written content?
- What structural difference exists between AI posts and human conversational writing?
- What specific distortions does AI writing assistance introduce into text?
- What textual properties make AI writing feel polished and confident?
- How do readers interpret AI text differently from human text?
- What specific narrative features best distinguish AI from human fiction?
- Do newer language models diverge further from human lexical patterns?
- How does the task type change which linguistic features distinguish AI from humans?
- What specific lexical dimensions separate AI writing from human writing?
- Why does AI writing sound human while failing lexical measurements?
- Does AI writing style remain distinct when content is masked or paraphrased?
- Why do newer AI models diverge further from human text patterns?
- Why do human stories land in statistically rarer regions than AI narratives?
- How do changes in human and AI writing distributions shift rarity measures over time?
- Why do preference-tuned models produce different diversity patterns in code versus creative writing?
- What makes human language fundamentally different from what language models produce?
Related concepts in this collection 2
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Can human judges detect measurable differences in AI text?
Research shows LLM text differs statistically across six lexical dimensions, but human readers—even experts—cannot reliably identify which texts are AI-generated. Why does measurement succeed where human perception fails?
the baseline paradox
-
Can humans detect AI text if machines can measure it?
AI-generated text shows measurable differences from human writing across multiple linguistic dimensions, yet human judges consistently fail to identify it. Why does the gap between what is measurable and what is perceptible exist?
writing angle
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Do LLMs produce texts with "human-like" lexical diversity?
- The Curse Of Recursion: Training On Generated Data Makes Models Forget
- Linguistic markers of inherently false AI communication and intentionally false human communication: Evidence from hotel reviews
- Word Meanings in Transformer Language Models
- Metadiscursive nouns in academic argument: ChatGPT vs student practices
- Pron vs Prompt: Can Large Language Models already Challenge a World-Class Fiction Author at Creative Text Writing?
- GhostWriter: Augmenting Collaborative Human-AI Writing Experiences Through Personalization and Agency
- LLM+P: Empowering Large Language Models with Optimal Planning Proficiency
Original note title
newer llm generations diverge further from human lexical patterns while becoming harder to detect