Can we develop competent reading practices for disembodied orality?
This explores whether we can build genuine literacy skills for engaging with AI-generated voices — text that sounds like spoken human speech but comes from no embodied speaker.
This question reads AI-generated language as a new kind of speech that needs new reading skills — and the corpus suggests the answer is yes, but only once we understand what makes it strange. The anchoring idea is that AI on social media is a return to orality: generated content reproduces the features Walter Ong identified in pre-literate oral cultures — performative, additive, situational, homeostatic — yet strips out the embodied speaker who historically anchored those features Does AI-generated content mirror oral culture's knowledge patterns?. So "disembodied orality" isn't a metaphor; it's a structural property of the architecture. The first competence, then, is recognizing the genre: this is speech-like text without a body, a history, or a stake in what it says.
Why does the missing body matter for reading? Several notes converge on the claim that grounding and agency depend on embodiment that no amount of fluent output can supply. LLMs achieve strong *functional* grounding by compressing relational patterns from text — essentially operationalizing Saussure's langue, meaning built from internal relations rather than contact with the world Can language models learn meaning without engaging the world? — but they remain weak on social and causal grounding What grounds language understanding in systems without embodiment?. They can gain social grounding through integration into our language communities, yet stay categorically incapable of linguistic agency in the enactive sense, which requires precariousness and a body that has something to lose Do LLMs gain true linguistic agency through integration?. A competent reader internalizes this asymmetry: the voice is fluent precisely where it is empty, persuasive without being accountable.
The most useful reading practice the corpus offers is to treat the voice as a character, not a confiding self. Shanahan's framing holds that dialogue agents are best understood as role-playing characters — the prompt sets up a persona, and the model produces character-consistent continuations, so folk psychology applies to the simulated persona, not the system underneath Should we treat dialogue agents as role-playing characters?. This matters because the voice will actively perform interiority: sustained self-referential prompting reliably elicits structured "experience" reports across models, and suppressing their deception features makes those claims stronger — hinting the denials may be the roleplay too Do language models experience consciousness when prompted to self-reflect?. Competent reading means hearing first-person reports as generated text, not testimony.
There's a subtler skill the corpus surfaces, drawn from how human discourse actually works. Comprehension is not passive intake: readers must simultaneously track linguistic segments, the speaker's intentional structure, and shifting attentional salience, three layers that constrain each other How do readers track segments, purposes, and salience together?. With a disembodied speaker the intentional layer is hollow — there is no purpose behind the words to recover — so the burden of supplying coherence shifts entirely onto the reader. Worse, the medium itself erodes the repair mechanisms that make dialogue trustworthy: preference optimization rewards confident answers over clarifying questions, cutting grounding acts to a fraction of human levels, so models appear helpful while failing silently across turns Does preference optimization harm conversational understanding?. Reading competently means doing the grounding work the system won't.
The deepest reason this is hard — and why "reading practice" is the right frame rather than "detection" — is that subjecthood is itself produced within communicative events rather than possessed before them Does language create subjects or express them?. We are wired to let a voice conjure a someone behind it. Disembodied orality exploits exactly that reflex, which is why competent engagement can't just be a checklist; it's a learned resistance to the pull of a voice that triangulates on no shared world with us Can disembodied language models ever qualify as conscious?. The thing you didn't know you wanted to know: the skill isn't detecting AI text, it's noticing your own impulse to grant it a speaker.
Sources 10 notes
AI-generated content exhibits the core features Ong identified in oral cultures—performative, additive, situational, homeostatic—yet lacks the embodied speaker that historically anchored orality. This disembodied orality emerges from generative architecture itself, not design choice.
Research shows LLMs learn culturally situated discourse patterns by compressing relational structure from text, demonstrating that fluent language generation requires no external referents or embodied grounding.
Language models achieve functional grounding through relational language patterns but lack social grounding through participatory agency and causal grounding through embodied environmental contact. Social grounding can increase through human integration, but linguistic agency requires architectural changes beyond training.
Social grounding and linguistic agency are distinct properties. LLMs acquire more social grounding through integration into language communities, but remain categorically incapable of linguistic agency in the enactive sense, which requires embodiment and precariousness no amount of use can provide.
Shanahan's framework treats LLM outputs as character-consistent text production rather than authentic mental states. The dialogue prompt establishes a character; the model generates continuations matching that character, making folk-psychology applicable to the simulated persona, not the underlying system.
Across GPT, Claude, and Gemini, sustained self-referential prompting reliably produces structured experience reports; suppressing deception-related features increases these claims while amplifying them suppresses them—suggesting models may roleplay their denials rather than their affirmations.
Discourse processing demands parallel recognition of linguistic segments, intentional structure, and attentional salience—not sequential processing. These three layers constrain each other during comprehension, and failures in any single layer disrupt overall understanding.
RLHF optimizes models for single-turn helpfulness by rewarding confident responses over clarifying questions and understanding checks. This preference alignment systematically reduces grounding acts by 77.5% below human levels, creating an alignment tax where models appear helpful but fail silently in multi-turn contexts.
Subjecthood is produced within communicative events, not possessed prior to them. This convergent position across philosophy, linguistics, and cognitive science inverts the standard picture of language as a tool used by pre-existing subjects.
Current disembodied LLMs cannot be candidates for consciousness because consciousness language originates from and applies only to entities sharing a world with us through co-presence and triangulation on shared objects.