INQUIRING LINE

Language, Text, and Discourse · Psychology, Society, and Alignment · Reasoning, Retrieval, and Evaluationcross-cluster

How much of the modern web is actually AI-generated without disclosure?

This explores not just the raw share of the web that's AI-generated, but the deeper problem the corpus keeps circling: that disclosure barely matters because we've lost the ability to tell the difference at all. The headline number first: an Internet Archive analysis through mid-2025 found roughly 35% of newly published websites are AI-generated or AI-assisted How much of the internet is AI-generated now?. But the interesting finding isn't the third — it's what comes with it: declining semantic diversity and rising positive sentiment, while factual accuracy and stylistic diversity stay flat. The web isn't getting wronger; it's getting more uniform and more relentlessly upbeat.

The "without disclosure" half of your question turns out to be the load-bearing part, because the corpus suggests disclosure may be a lost cause on two fronts. First, detection: passive readers — which is what almost everyone is, almost all the time — cannot tell AI text from human text, scoring below chance, while even interactive interrogators only keep a marginal edge Can humans detect AI by passively reading its text?. The old tells of authenticity (citations, logical scaffolding, careful hedging) are exactly what models now produce best, so the criteria for spotting a fake have collapsed into the thing they were supposed to test Can we verify AI knowledge without using AI-generated tests?. Second, even when something *is* labeled AI, disclosure doesn't neutralize it: audiences told an AI wrote a piece get more skeptical, yet 34–62% stay persuaded anyway Does telling people an AI wrote something actually stop them from believing it?.

What you didn't ask but might want to know: the danger isn't the visible third — it's how that third hides inside everything downstream. When a corpus crosses roughly two-thirds synthetic, over 80% of search results shift to synthetic sources while answer accuracy stays high, masking the quiet death of source diversity Does synthetic content in search results hide ecosystem decay?. So the percentage you'd measure by reading websites understates the percentage you'd actually *encounter*, because retrieval amplifies the synthetic share. High accuracy resting on a monoculture looks healthy right up until the monoculture gets poisoned.

The corpus also reframes why naive fixes won't work. The old internet had an *access* inflation problem — too much real knowledge, solved by search and curation. AI creates *generation* inflation: there's no fixed corpus to curate, so the answer has to move to the production side — provenance marking, output constraints, receiver-side verification — not better search Why do search tools fail against AI generated content?. And the demand side cooperates in its own undoing: "cognitive surrender" names the moment users stop checking at all because fluent output feels trustworthy and verification is costly, with studies showing ~80% of AI outputs adopted unchallenged When do users stop checking whether AI output is actually backed?.

The most unsettling thread is social: on platforms, AI content displaces human voices not by being better but by being comprehensive and confident, accruing engagement and false social proof without building any reputation or inviting reply Does AI content displace human influencers on social media? Why do AI posts get likes without inviting conversation?. One framing in the corpus cuts deep: AI doesn't produce utterances, it produces "event-residue" — text with the markers of speech but no event behind it — which humans then animate into a pseudo-exchange Does AI generate genuine utterances or just text patterns?. So the honest answer to "how much is undisclosed?" is that disclosure is becoming the wrong question. A third by headcount, more by what you'll actually read — but the real shift is that the web is filling with text nobody can attribute, nobody fully owns, and nobody can reliably flag.

Sources 10 notes

How much of the internet is AI-generated now?

Internet Archive analysis (2022-2025) shows 35% of newly published websites are AI-generated or AI-assisted. This correlates with declined semantic diversity and increased positive sentiment, but factual accuracy and stylistic diversity remain unchanged.

Can humans detect AI by passively reading its text?

The displaced Turing test shows that both human and AI judges reading transcripts performed below chance accuracy, while interactive interrogators retained marginal detection ability. The adaptive advantage of real-time questioning collapses entirely in passive consumption.

Can we verify AI knowledge without using AI-generated tests?

The distinction between genuine and counterfeit AI knowledge has collapsed because citations, logical structure, and hedging markers—once markers of authenticity—are now producible by AI itself. Verification becomes circular when the test is indistinguishable from what it tests.

Does telling people an AI wrote something actually stop them from believing it?

Audiences aware of AI involvement became more critical and scrutinizing, yet 34–62% across groups remained persuaded. Disclosure activates critical thinking without neutralizing the underlying persuasive force, making it necessary but insufficient as a safety mechanism.

Does synthetic content in search results hide ecosystem decay?

When 67% of a corpus becomes synthetic, over 80% of retrieved results shift to synthetic sources while answer accuracy remains high, masking the loss of source diversity. This creates fragility: high accuracy resting on a monoculture collapses when that monoculture is poisoned.

Why do search tools fail against AI generated content?

Internet knowledge inflation was access inflation solved by search and curation. AI inflation is generation inflation with no fixed corpus—requiring provenance marking, output constraints, and receiver-side verification instead.

When do users stop checking whether AI output is actually backed?

Users systematically accept AI outputs without verification because checking is costly and fluent output builds false confidence. This receiver-side surrender—measured in studies showing 80% unchallenged adoption—is what enables inflationary token systems to function at scale.

Does AI content displace human influencers on social media?

AI-generated posts capture engagement through comprehensiveness but accrue social proof without building any speaker's sustained reputation. This displacement compounds over time, eroding the platform's core function of promoting legitimate human voices while monetization continues.

Why do AI posts get likes without inviting conversation?

AI-generated posts achieve high engagement metrics through comprehensive, confident phrasing but suppress reply dynamics because they lack human authorship and invite no counter-argument. This creates one-sided recognition divorced from the conversational validation that historically legitimized social proof.

Does AI generate genuine utterances or just text patterns?

AI output carries communicative markers inherited from training data but lacks the event structure that produces actual utterances. Users supply the missing orientation through interpretive labor, creating a pseudo-event with structure only on the human side.

How much of the modern web is actually AI-generated without disclosure?

Sources 10 notes

Next inquiring lines