How does AI speech differ from broadcast speech in its carrier structure?
This explores how AI-generated speech breaks from broadcast/radio/TV speech (what media theory calls 'secondary orality') specifically in who or what carries the utterance — the speaker behind the voice.
This explores how AI-generated speech breaks from broadcast speech — radio, TV, the recorded voice — not in surface form but in what sits behind the voice carrying it. The short version: every prior kind of orality in media history, even the most heavily mediated, ran through a *carrier-person*. AI is the first that doesn't.
Broadcast speech is what media theory calls secondary orality. The radio announcer or the TV host produces speech that's been technologized, edited, and beamed to millions — but there is still a body in a booth, a speaker who generates and anchors the utterance. The carrier is displaced and amplified, never removed. AI orality severs exactly this link: it produces utterances with all the formal markers of speech — performative, conversational, additive — while no embodied speaker generates or anchors them Where is the speaker when AI produces speech?. That's the structural novelty. The carrier slot in the chain, occupied for the entire history of human communication, is now empty.
What fills the gap is interpretive labor on the listener's side. The corpus frames AI output as *event-residue* rather than utterance: it carries communicative markers inherited from training data but lacks the event structure — the actual occasion of someone meaning something to someone — that produces a real utterance. The human animates that residue into a pseudo-exchange, supplying the orientation that a broadcast speaker would have supplied themselves Does AI generate genuine utterances or just text patterns?. So the asymmetry is the tell: in broadcast, structure exists on both ends; with AI, it exists only on the human side.
This carrier-absence isn't unique to voice — it shows up in text too, which suggests it's a property of the generative mechanism, not the medium. AI writing structurally lacks the *internal appeal to a reader's attention* that human communication performs, producing the aloofness people sense Does AI writing lack the internal appeal to attention that humans use?; artificial text eliminates embodied authorship and political situatedness as structural absences rather than stylistic flaws Does AI-generated text lose core properties of human writing?. The deeper claim across these notes is that LLMs run a fundamentally different operation than human speakers — strings from a probability distribution versus a person addressing and relating to another Are language models and human speakers doing the same thing? — which is why communication-as-social-action simply isn't happening even when the output is fluent Does AI really communicate or just distribute information?.
Here's the part you might not have expected to care about: the broadcast voice and the AI voice can sound identical and still be doing categorically different things. The conversational interface — the friendly turn-taking, the responsive tone — actively *obscures* the missing carrier, making an empty speaker-slot feel occupied. For comparison, it's worth knowing the acoustic substrate isn't the issue: speech models can learn the real articulatory physics of how a vocal tract makes sound Do speech models learn language-specific sounds or universal physics?. The difference between AI and broadcast was never in the signal. It's in whether anyone is on the other end of it.
Sources 7 notes
AI produces utterances with the formal properties of speech—performative, additive, conversational—but no embodied speaker generates or anchors them. This breaks the historical pattern where all prior orality, primary and secondary, depended on a carrier-person, making AI structurally novel in media history.
AI output carries communicative markers inherited from training data but lacks the event structure that produces actual utterances. Users supply the missing orientation through interpretive labor, creating a pseudo-event with structure only on the human side.
Human writing contains an appeal to the reader's attention as a fundamental property of communication itself. AI-generated posts inherit platform visibility but do not perform this internal appeal, producing the reported aloofness readers perceive — a structural absence, not a stylistic defect.
Research shows artificial text disrupts dialogic symmetry, context continuity, embodied authorship, and political situatedness. These are not surface flaws but structural absences—AI hotel reviews show 80%+ detection accuracy due to inherent falsity about personal experience distinct from human deception.
LLMs produce strings via probability distributions; humans use language to address and relate to others. They share surface form but differ in what produces output, what it does socially, and what receivers should do with it.
Communication is a relational act between persons that does work in a relationship; AI generates content without this relational structure, speaker responsibility, or mutual uptake. The conversational interface obscures this structural difference.
Self-supervised speech models learn the language-agnostic physics of how the vocal tract produces acoustics, not language-specific phonetic categories. This explains their multilingual transfer and predicts their downstream task performance better than phonetic probing.