INQUIRING LINE

Why do different AI models generate similar outputs independently?

This explores why separately-trained AI models tend to land on the same answers — the 'Artificial Hivemind' effect — and what in their construction pushes them toward convergence rather than diversity.


This explores why separately-trained AI models tend to land on the same answers, even when you'd expect different architectures and labs to produce genuine variety. The most direct evidence comes from INFINITY-CHAT, which tested 70+ models across 26K open-ended queries and found that they independently generate strikingly similar — sometimes identical — responses Do different AI models actually produce diverse outputs?. The culprit isn't coordination; it's shared inputs. Models overlap heavily in their training data and lean on near-identical alignment procedures, so the diversity you'd hope to get from running an ensemble of different models largely evaporates.

The corpus suggests the convergence happens in two stages. First, pretraining gives every large model a heavily overlapping picture of the same internet. Then post-training narrows things further: controlled experiments show that reinforcement learning amplifies a single dominant output format from pretraining within the first epoch while actively suppressing the alternatives Does RL training collapse format diversity in pretrained models?. So even where a base model held multiple ways of expressing something, the alignment step collapses them toward one — and since labs use similar RLHF recipes, they collapse toward similar ones. A related finding reinforces this: RL post-training seems to teach output *format* and organization rather than new knowledge, which is why a tiny model can match much larger ones just by adapting its format Can small models reason well by just learning output format?. If what alignment mostly installs is a shared house style, shared style is exactly what you'd expect to see across models.

Here's the part you might not have expected: this convergence pulls models *away* from humans even as it pulls them *toward* each other. Newer generations like GPT-4.5 and o4-mini actually diverge further from human lexical patterns than earlier models did, because RLHF optimizes for what raters score as high-quality, not for what looks human Why do newer AI models diverge further from human writing patterns?. So the shared optimization target is the deep reason for both effects — models cluster together in a region of output space that human raters reward, which is not the same region where human writers naturally sit.

There's also a more fundamental framing worth sitting with: LLM output is produced by sampling from probability distributions over the same kind of training corpus, which is a structurally different operation from how humans use language to address one another Are language models and human speakers doing the same thing?. When many systems are all estimating roughly the same distribution and decoding it the same way, agreement is almost the default rather than a surprise. The genuinely interesting tension is that the same technology is also described as essentially mutable — outputs shift with sampling, prompt wording, and audience Why does AI output change with every prompt and context?. So models are at once highly variable within a single system and highly convergent across systems: the randomness lives in the sampling, but the center of gravity is shared. If you want to go deeper, the practical sting is in the first note — model ensembles promise diversity they can't actually deliver when every member learned from the same place and was aligned the same way.


Sources 6 notes

Do different AI models actually produce diverse outputs?

INFINITY-CHAT analyzed 70+ models across 26K open-ended queries and found an "Artificial Hivemind" effect: models independently generate strikingly similar or identical responses due to overlapping training data and alignment procedures, undermining the diversity benefits of model ensembles.

Does RL training collapse format diversity in pretrained models?

Controlled experiments show RL consistently amplifies one format distribution from pretraining within the first epoch while collapsing alternatives. The winning format depends on model scale, not necessarily performance, and is largely hidden when starting from proprietary pretrained models.

Can small models reason well by just learning output format?

A 1.5B parameter model with LoRA-only post-training matched larger full-parameter RL models on reasoning tasks, suggesting RL teaches output format organization rather than new factual knowledge. This efficiency indicates reasoning and knowledge storage are separable capabilities.

Why do newer AI models diverge further from human writing patterns?

ChatGPT-4.5 and o4-mini show greater lexical diversity differences from human text than earlier models, yet human judges cannot reliably distinguish them. Training objectives like RLHF appear to optimize for quality ratings rather than human-like writing patterns.

Are language models and human speakers doing the same thing?

LLMs produce strings via probability distributions; humans use language to address and relate to others. They share surface form but differ in what produces output, what it does socially, and what receivers should do with it.

Why does AI output change with every prompt and context?

AI outputs exhibit essential mutability—they vary with sampling, prompt wording, and audience interpretation. This is not a defect but a defining feature of tokens as media, making them fundamentally different from fixed commodities and resistant to traditional quality assurance.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are an AI researcher re-evaluating claims about LLM convergence. The question: Why do independently-trained models generate similar outputs?

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–2025. A library of ~70+ models in INFINITY-CHAT showed strikingly similar responses across 26K queries, attributed to overlapping training data and near-identical alignment. RL post-training amplifies a single dominant pretraining output format within the first epoch, suppressing alternatives (2025). Tiny models match much larger ones by adapting output *format* via LoRA, not acquiring new knowledge (2025). Newer models (GPT-4.5, o4-mini) diverge further from human lexical patterns than earlier ones, because RLHF optimizes for rater scores, not human naturalness (2025). LLM outputs sample from shared probability distributions over the same training corpus—a structurally different operation from human language use (date unclear). Models are simultaneously high-variance within a single system (via sampling, prompts, context) yet convergent across systems (shared optimization target) (2025).

Anchor papers (verify; mind their dates):
- arXiv:2504.07912 *Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining* (2025)
- arXiv:2504.15777 *Tina: Tiny Reasoning Models via LoRA* (2025)
- arXiv:2508.00086 *Do LLMs produce texts with "human-like" lexical diversity?* (2025)
- arXiv:2510.22954 *Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)* (2025)

Your task:
(1) RE-TEST EACH CONSTRAINT. For RL-driven format collapse: have newer training recipes (e.g., DPO, IPO, constitutional AI variants) since relaxed the single-dominant-format finding, or do they replicate it? For output-space divergence from humans: do instruction-tuning methods that weight human preference diversity differently alter the lexical pattern gap? For ensemble homogeneity: do recent multi-agent frameworks, mixture-of-experts, or speculative decoding recover diversity that shared alignment erased? Separate the durable question (likely: shared corpora + similar alignment → convergence) from perishable claims (e.g., RL always collapses to one format).
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months: any paper showing that controlled diversity in training data, decoding strategy, or RL objective substantially *prevents* cross-model convergence?
(3) Propose 2 research questions that ASSUME the regime may have moved: (a) If alignment methods have diversified since mid-2025, does cross-model convergence now degrade gracefully with heterogeneous post-training? (b) Can you engineer corpora or RL objectives that push models *away* from the shared rater-preferred region, and does that increase diversity without degrading capability?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines