INQUIRING LINE

How do audiences evaluate speech when there is no speaker to assess?

This explores what happens to the normal listener's habit of judging a message by judging its speaker — once AI produces speech that has no embodied person behind it to hold accountable.


This explores what happens to the normal listener's habit of judging a message by judging its speaker — once AI produces speech with no embodied person behind it. The corpus's starting point is that this is a genuinely new situation: AI orality has all the formal feel of speech — performative, conversational, additive — but breaks the historical rule that every utterance, even recorded or broadcast ones, traces back to a carrier-person you can locate and assess Where is the speaker when AI produces speech?. So the question isn't rhetorical. Audiences really do lose the anchor they've always used.

What several notes suggest is that listeners may never have been evaluating the speaker as directly as we assume — they were evaluating themselves. In debate corpora, who you already are predicts whether you're persuaded far better than anything about the language used; once a reader's political and religious priors are controlled for, the supposedly persuasive features of the text largely evaporate Does what readers believe matter more than what debaters say? Do linguistic features of persuasion stay the same across audiences?. That reframes the puzzle: if much of "evaluation" was always the audience matching speech against its own beliefs, then a missing speaker matters less than you'd think for whether the message lands — and it explains why AI can sway people effectively even though it can't reliably judge the very arguments it makes Can LLMs persuade without actually understanding arguments?.

But landing is not the same as warranting trust, and here the corpus pushes back hard. A system can produce contextually perfect speech and still lack the thing that makes a speaker assessable — accountability, an evaluative stance, the relational conditions of being a communicative subject. Tests that pass any fluent text-producer are calibrated to the wrong phenomenon; they detect speech-shaped output, not a someone behind it Does behavioral speech output prove communicative subjecthood?. This connects to a deeper claim: subjecthood isn't possessed before language and then expressed through it — it's produced inside the communicative event itself Does language create subjects or express them?. On that view a "speaker" is a role the exchange conjures, which is exactly why disembodied AI speech can feel like it has one even when no person is there.

The practical lever, then, isn't finding the missing speaker but doing the grounding work that listeners normally offload onto them. Meaning was never carried by words alone — the same words mean different things to different people, and real understanding takes active, collaborative calibration of shared reference Why do speakers need to actively calibrate shared reference?. Audiences also lean on perceived personality cues, but those are unreliable readouts: the acoustic signals that read as confident extraversion in a calm setting flip to signaling neuroticism under stress, so even "who is this" judgments are context artifacts, not stable speaker facts Does personality sound the same in stressful and neutral conversations?.

The thing you may not have known you wanted to know: the absence of a speaker doesn't break evaluation so much as expose how much of it was always being done by the audience. The competent move with AI voices is to stop asking "who said this and can I trust them" and start doing explicitly what a trustworthy speaker used to let us skip — checking the claim against shared reference, watching our own priors, and noticing that fluency is not the same as a position someone can be held to.


Sources 8 notes

Where is the speaker when AI produces speech?

AI produces utterances with the formal properties of speech—performative, additive, conversational—but no embodied speaker generates or anchors them. This breaks the historical pattern where all prior orality, primary and secondary, depended on a carrier-person, making AI structurally novel in media history.

Does what readers believe matter more than what debaters say?

Analysis of debate corpora shows that political and religious ideology labels of voters outpredict linguistic features when modeling debate outcomes. Language effects observed without reader controls are confounded by audience composition correlated with debate topics.

Do linguistic features of persuasion stay the same across audiences?

The linguistic features that predict persuasion success change dramatically once political and religious ideology are added as statistical controls. Features appearing predictive in standard analyses often reflect audience-text matching rather than true language effects, making many published findings potentially artifacts of audience composition.

Can LLMs persuade without actually understanding arguments?

The Thin Line study shows LLMs sway debate participants and audiences but cannot reliably evaluate those same debates, with inter-annotator agreement ranging from near-zero to 0.6. Persuasive competence and pragmatic comprehension are separable capabilities.

Does behavioral speech output prove communicative subjecthood?

Chalmers' test passes any system producing contextually appropriate text, but communicative subjecthood requires relational-normative conditions like accountability and evaluative stance. The test is calibrated to the wrong phenomenon, creating false positives like puppets that walk-shaped without walking.

Does language create subjects or express them?

Subjecthood is produced within communicative events, not possessed prior to them. This convergent position across philosophy, linguistics, and cognitive science inverts the standard picture of language as a tool used by pre-existing subjects.

Why do speakers need to actively calibrate shared reference?

The same words can mean different things to different speakers because referential grounding is person-specific. True communicative grounding demands collaborative negotiation of how language connects to the world, not mere surface-level word sharing.

Does personality sound the same in stressful and neutral conversations?

Acoustic features that signal extraversion in neutral interviews instead predict neuroticism under stress. Handcrafted acoustic features outperform neural embeddings, suggesting personality is conveyed through specific measurable behaviors rather than holistic speaker style.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing claims about how audiences evaluate AI-generated speech without an embodied speaker. The question remains open: does the absence of a speaker fundamentally alter audience evaluation, or does it expose evaluation mechanisms that were always listener-driven?

What a curated library found — and when (dated claims, not current truth):
Findings span 2019–2025.
• Reader priors predict persuasion outcomes far more reliably than linguistic features; once beliefs are controlled, supposedly persuasive text properties largely disappear (2019).
• LLM persuasiveness dissociates from argument comprehension — AI can sway audiences effectively despite failing to reliably judge its own reasoning (2024).
• Perceived personality in speech is unstable across contexts: acoustic cues read as confident extraversion in calm settings but signal neuroticism under stress, making "who is this" judgments context artifacts rather than stable speaker properties (2025).
• Communicative grounding requires active calibration of shared reference, not passive word-sharing; audiences normally offload this work onto the speaker (2023–2025).
• Subjecthood is produced *within* communicative events, not pre-existing and expressed through language — a "speaker" is a role the exchange conjures (2024–2025).

Anchor papers (verify; mind their dates):
• arXiv:1906.11301 (2019) — Prior Beliefs for Argument Persuasion
• arXiv:2404.09329 (2024) — LLM Persuasiveness and Cognitive Effort
• arXiv:2507.19137 (2025) — Comprehension vs. Persuasion in LLMs
• arXiv:2507.01936 (2025) — Conversational Alignment with AI

Your task:
(1) RE-TEST EACH CONSTRAINT. For the claim that priors dominate persuasion outcomes: have recent multimodal models, chain-of-thought scaffolding, or real-time audience modeling since shifted the ratio? Does the dissociation between LLM persuasiveness and argument-comprehension still hold, or have newer architectures improved reasoning-persuasion alignment? Test whether personality instability persists or whether better context modeling now yields stable trait readings. Separate the durable question (do audiences always ground evaluation in their own beliefs?) from perishable limitations (do current models fail to aid that grounding?).

(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months — especially on accountability signals, multi-agent orchestration enabling synthetic "speaker roles," or evaluation harnesses that successfully proxy for embodied speaker assessment.

(3) Propose 2 research questions that ASSUME the regime may have shifted:
   a) If audiences can now explicitly offload grounding work to AI tools (e.g., real-time fact-checking, perspective-listing), does the absence of a speaker become *less* consequential for trust?
   b) Can adversarial or multi-agent setups synthetically reconstruct accountability—the relational condition the library identifies as core to speakerhood—and if so, does that restore speaker-based evaluation?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines