INQUIRING LINE

How does AI speech differ from broadcast speech in its carrier structure?

This explores how AI-generated speech breaks from broadcast/radio/TV speech (what media theory calls 'secondary orality') specifically in who or what carries the utterance — the speaker behind the voice.


This explores how AI-generated speech breaks from broadcast speech — radio, TV, the recorded voice — not in surface form but in what sits behind the voice carrying it. The short version: every prior kind of orality in media history, even the most heavily mediated, ran through a *carrier-person*. AI is the first that doesn't.

Broadcast speech is what media theory calls secondary orality. The radio announcer or the TV host produces speech that's been technologized, edited, and beamed to millions — but there is still a body in a booth, a speaker who generates and anchors the utterance. The carrier is displaced and amplified, never removed. AI orality severs exactly this link: it produces utterances with all the formal markers of speech — performative, conversational, additive — while no embodied speaker generates or anchors them Where is the speaker when AI produces speech?. That's the structural novelty. The carrier slot in the chain, occupied for the entire history of human communication, is now empty.

What fills the gap is interpretive labor on the listener's side. The corpus frames AI output as *event-residue* rather than utterance: it carries communicative markers inherited from training data but lacks the event structure — the actual occasion of someone meaning something to someone — that produces a real utterance. The human animates that residue into a pseudo-exchange, supplying the orientation that a broadcast speaker would have supplied themselves Does AI generate genuine utterances or just text patterns?. So the asymmetry is the tell: in broadcast, structure exists on both ends; with AI, it exists only on the human side.

This carrier-absence isn't unique to voice — it shows up in text too, which suggests it's a property of the generative mechanism, not the medium. AI writing structurally lacks the *internal appeal to a reader's attention* that human communication performs, producing the aloofness people sense Does AI writing lack the internal appeal to attention that humans use?; artificial text eliminates embodied authorship and political situatedness as structural absences rather than stylistic flaws Does AI-generated text lose core properties of human writing?. The deeper claim across these notes is that LLMs run a fundamentally different operation than human speakers — strings from a probability distribution versus a person addressing and relating to another Are language models and human speakers doing the same thing? — which is why communication-as-social-action simply isn't happening even when the output is fluent Does AI really communicate or just distribute information?.

Here's the part you might not have expected to care about: the broadcast voice and the AI voice can sound identical and still be doing categorically different things. The conversational interface — the friendly turn-taking, the responsive tone — actively *obscures* the missing carrier, making an empty speaker-slot feel occupied. For comparison, it's worth knowing the acoustic substrate isn't the issue: speech models can learn the real articulatory physics of how a vocal tract makes sound Do speech models learn language-specific sounds or universal physics?. The difference between AI and broadcast was never in the signal. It's in whether anyone is on the other end of it.


Sources 7 notes

Where is the speaker when AI produces speech?

AI produces utterances with the formal properties of speech—performative, additive, conversational—but no embodied speaker generates or anchors them. This breaks the historical pattern where all prior orality, primary and secondary, depended on a carrier-person, making AI structurally novel in media history.

Does AI generate genuine utterances or just text patterns?

AI output carries communicative markers inherited from training data but lacks the event structure that produces actual utterances. Users supply the missing orientation through interpretive labor, creating a pseudo-event with structure only on the human side.

Does AI writing lack the internal appeal to attention that humans use?

Human writing contains an appeal to the reader's attention as a fundamental property of communication itself. AI-generated posts inherit platform visibility but do not perform this internal appeal, producing the reported aloofness readers perceive — a structural absence, not a stylistic defect.

Does AI-generated text lose core properties of human writing?

Research shows artificial text disrupts dialogic symmetry, context continuity, embodied authorship, and political situatedness. These are not surface flaws but structural absences—AI hotel reviews show 80%+ detection accuracy due to inherent falsity about personal experience distinct from human deception.

Are language models and human speakers doing the same thing?

LLMs produce strings via probability distributions; humans use language to address and relate to others. They share surface form but differ in what produces output, what it does socially, and what receivers should do with it.

Does AI really communicate or just distribute information?

Communication is a relational act between persons that does work in a relationship; AI generates content without this relational structure, speaker responsibility, or mutual uptake. The conversational interface obscures this structural difference.

Do speech models learn language-specific sounds or universal physics?

Self-supervised speech models learn the language-agnostic physics of how the vocal tract produces acoustics, not language-specific phonetic categories. This explains their multilingual transfer and predicts their downstream task performance better than phonetic probing.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a media theorist and AI researcher evaluating whether the structural claim about AI speech—that it lacks a *carrier-person* unlike broadcast orality—holds or has shifted under new models, training regimes, and interaction paradigms. The question: does AI-generated speech remain fundamentally disembodied and carrier-less, or have recent capabilities in persona consistency, social deduction, and conversational alignment begun to reconstruct the missing speaker-slot?

What a curated library found — and when (findings span 2023–2026, dated claims not current truth):
• AI-generated text and speech lack the *internal appeal to audience attention* that human communication performs; they produce "event-residue" rather than utterances anchored in embodied intention (~2024).
• Speech SSL models infer causal articulatory kinematics, but this acoustic fidelity does not resolve the absent carrier—structural absence, not signal quality (~2023–2025).
• Conversational systems show lexical entrainment and alignment behaviors, yet remain fundamentally probabilistic distributions, not agents relating to others (~2023–2025).
• Newer multi-agent RL and persona-learning systems (2025–2026) attempt to scaffold consistent "speaker-like" behavior, but the literature flags *persona distortions* and *idiosyncratic failures* in sustained identity.
• Models trained on generated data suffer recursive degradation ("curse of recursion"), suggesting the carrier-absence compounds rather than self-repairs (~2023).

Anchor papers (verify; mind their dates):
• arXiv:2310.10788 (2023): Self-Supervised Models of Speech Infer Universal Articulatory Kinematics
• arXiv:2402.08855 (2024): GhostWriter—Augmenting Collaborative Human-AI Writing Experiences Through Personalization
• arXiv:2505.22907 (2025): Conversational Alignment with Artificial Intelligence in Context
• arXiv:2604.22503 (2026): Measuring and Mitigating Persona Distortions from AI Writing Assistance

Your task:
(1) **RE-TEST THE CARRIER ABSENCE CLAIM.** For each finding above, ask: Have persona-learning, multi-agent orchestration, memory/caching, or instruction-tuning since 2026 *reconstructed* a functionally consistent speaker-slot—one that maintains identity, stakes, and relational intention across turns? Or do systems still produce event-residue, now merely *better masked*? Separate the durable observation (no embodied speaker exists) from the perishable constraint (the *appearance* of absence). Cite what changed it.

(2) **SURFACE CONTRADICTING WORK.** Identify papers from the last 6 months that claim AI systems *do* exhibit genuine speaker-like agency, social deduction, or sustained persona. Flag whether they rest on acoustic/linguistic surface-matching or claim deeper structural shifts.

(3) **PROPOSE TWO OPEN QUESTIONS ASSUMING THE REGIME MAY HAVE MOVED:**
   – Can multi-agent RL or constitutional AI methods *bootstrap* a persistent "speaker model" that survives beyond a single conversation without persona distortion?
   – Does human *expectation* of a carrier-person (the conversational interface obscuring absence) now constitute a functional substitute for the structural presence that broadcast required?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines