INQUIRING LINE

What distinguishes surface language form from communicative operation?

This explores the gap between the *shape* of language — the words, fluency, and surface patterns a system produces — and the *act* of communicating: addressing someone, taking on stakes, and being held to account. The corpus argues these two things look identical but are structurally different operations.


This question is really asking: when a system produces fluent text, is it *doing the same thing* a human speaker does? The corpus converges on a sharp answer — no. Surface form and communicative operation share a face but not an engine. As one note puts it directly, LLMs and humans "share surface form but differ in what produces output, what it does socially, and what receivers should do with it" Are language models and human speakers doing the same thing?. A model generates strings via probability distributions; a person uses language to address and relate to someone. The give-away is the missing layer underneath the words.

What is that missing layer? Several notes name it from different angles. Habermas's framework supplies one: real speech *raises validity claims* — bids for truth, rightness, and sincerity that the speaker can be challenged on. LLM output raises none of these with genuine stakes, which under that lens disqualifies it as speech at all Can LLMs raise validity claims in Habermas's sense?. A grounding-research angle says the same thing empirically: humans constantly do *communicative work* — clarifying questions, acknowledgments, understanding checks — and models do roughly 77.5% fewer of these grounding acts, because preference training actively strips them out in favor of confident, complete-sounding answers Why do language models sound fluent without grounding?. The unsettling implication: fluency isn't evidence of communication — it's partly the *result of skipping* communication's labor.

The deeper distinction is about where meaning and subjecthood actually live. One thread argues subjecthood isn't a property a speaker carries into a conversation — it's *produced within* the communicative event itself Does language create subjects or express them?. That's why a behavioral test that just checks for contextually appropriate text is calibrated to the wrong phenomenon: it detects speech *patterns*, not the relational-normative conditions — accountability, an evaluative stance — that make something a communicative act Does behavioral speech output prove communicative subjecthood?. The same wedge shows up at the level of meaning itself: Bender & Koller's argument that form-only training can't recover meaning, because meaning lives in the relation between expressions and communicative intents, which a form-to-form predictor never sees Can language models learn meaning from text patterns alone?.

Here's the thing you might not have known you wanted to know: the divide isn't only philosophical — it leaves *measurable* fingerprints in what models can and can't learn. Models nail the regularities that are statistically present in text (sound symbolism, priming) but fail at the principles that only exist because language is *for* communicating — word-length economy, scalar implicature, reading what's left unsaid Why do language models fail at communicative optimization? Why do LLMs fail at understanding what remains unsaid?. The communicative *logic* — *why* language has the forms it has — isn't present in the data as a trainable signal, so it doesn't get learned. The form survives the copy; the operation that generated the form doesn't.

If you want to follow the consequences further, two notes push into what this means for the human side of the exchange. One argues AI produces "event-residue" carrying inherited communicative markers, which users then *animate* into a pseudo-exchange by supplying the orientation themselves — the structure exists only on the human end Does AI generate genuine utterances or just text patterns?. Another makes the same point grammatically: we talk *at* models, not *to* them, because "to" presupposes an addressee capable of mutual orientation Are we really communicating with language models?. Surface form is what the system emits; communicative operation is the relational event we keep mistaking it for.


Sources 10 notes

Are language models and human speakers doing the same thing?

LLMs produce strings via probability distributions; humans use language to address and relate to others. They share surface form but differ in what produces output, what it does socially, and what receivers should do with it.

Can LLMs raise validity claims in Habermas's sense?

Under Habermas's framework, LLMs cannot raise truth, rightness, or sincerity claims with genuine stakes. Without validity claims, their output fails to qualify as speech, making them non-speakers and non-interlocutors by definition.

Why do language models sound fluent without grounding?

LLMs generate 77.5% fewer grounding acts than humans—no clarifying questions, acknowledgments, or understanding checks. Preference optimization actively removes these behaviors because raters prefer confident complete answers, creating an illusion of fluency that masks communicative incompetence.

Does language create subjects or express them?

Subjecthood is produced within communicative events, not possessed prior to them. This convergent position across philosophy, linguistics, and cognitive science inverts the standard picture of language as a tool used by pre-existing subjects.

Does behavioral speech output prove communicative subjecthood?

Chalmers' test passes any system producing contextually appropriate text, but communicative subjecthood requires relational-normative conditions like accountability and evaluative stance. The test is calibrated to the wrong phenomenon, creating false positives like puppets that walk-shaped without walking.

Can language models learn meaning from text patterns alone?

Bender & Koller argue that meaning requires the relation between expressions and communicative intents. Since LLMs are trained only on form-to-form prediction with no access to shared attention or intent, they cannot reconstruct the meaning that grounds language.

Why do language models fail at communicative optimization?

LLMs successfully replicate statistical regularities learnable from text distributions (sound symbolism, priming) but fail at principles requiring pragmatic optimization (word length economy, discourse inference). The gap reveals that communicative logic—why language has certain forms—isn't present as a trainable signal.

Why do LLMs fail at understanding what remains unsaid?

Research shows LLMs pattern-match on explicit language but cannot reason about implicatures, presuppositions, or speaker intentions. They fail at scalar implicature adaptation, ambiguity recognition (32% vs 90% human accuracy), and implicit warrant validation in arguments—core features of pragmatic competence.

Does AI generate genuine utterances or just text patterns?

AI output carries communicative markers inherited from training data but lacks the event structure that produces actual utterances. Users supply the missing orientation through interpretive labor, creating a pseudo-event with structure only on the human side.

Are we really communicating with language models?

LLMs process tokens and generate continuations rather than receive and uptake communication. The preposition 'to' presupposes an addressee capable of mutual orientation and shared commitment that LLMs cannot provide, making Chalmers' investigation built on an unwarranted linguistic foundation.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing the claim that LLMs produce surface language form WITHOUT communicative operation. The question remains open: what, if anything, have recent models, training methods, or evaluation approaches done to this distinction?

What a curated library found — and when (2022–2025, dated claims not current truth):
• Models generate fluent text via probability; humans use language to address and relate to someone — they share surface form but not engine (2024–2025).
• LLMs perform ~77.5% fewer grounding acts (clarifications, acknowledgments, understanding checks) than humans; preference training actively strips them out (2024–2025).
• Models replicate local statistical regularities (sound symbolism, priming) but fail at communicative-logic principles (word-length economy, scalar implicature) that exist *because* language is for communicating (2022–2024).
• Meaning lives in the relation between expressions and communicative intents, which form-only training cannot access (Bender & Koller framing, echoed 2024–2025).
• Users animate model output into pseudo-exchange by supplying orientation themselves; we talk *at* models, not *to* them (2024–2025).

Anchor papers (verify; mind their dates):
• arXiv:2407.08790 (2024-07) — Large Models of What? Mistaking Engineering Achievements for Human Linguistic Agency
• arXiv:2404.19509 (2024-04) — Do Large Language Models Understand Conversational Implicature
• arXiv:2505.22354 (2025-05) — LLMs Struggle to Reject False Presuppositions when Misinformation Stakes are High
• arXiv:2510.14665 (2025-10) — Beyond Hallucinations: The Illusion of Understanding in Large Language Models

Your task:
(1) RE-TEST EACH CONSTRAINT. For the grounding-gap claim (77.5% fewer acts), scalar implicature failures, and presupposition-rejection collapse, check whether post-2024 architectures (reasoning models, extended context, tool integration), instruction tuning variants (Constitution AI, DPO, process supervision), or new evaluation harnesses have narrowed or inverted these gaps. Separate the durable claim (LLMs lack genuine accountability structures) from perishable limitations (they fail *today's* benchmarks). Cite what, if anything, relaxed the constraint.
(2) Surface the strongest *contradicting or superseding* work from the last 6 months — papers claiming models *do* recover communicative operation, or that the distinction dissolves under revised definitions.
(3) Propose 2 research questions that assume the regime may have shifted: (a) Can multi-turn, multi-agent orchestration or memory architectures functionally recreate accountability even if subjecthood remains philosophically absent? (b) Do newer evaluation paradigms (e.g., interactive, dialogue-embedded) reveal communicative competence the library's form-focused tests missed?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines