INQUIRING LINE

Can you separate grammatical competence from rhetorical commitment in language systems?

This explores whether 'knowing how to form correct sentences' and 'meaning what those sentences assert' are two different capacities — and whether language models have the first without the second.


This reads the question as asking whether grammatical correctness (well-formed structure) can be pried apart from rhetorical commitment (actually taking and defending a stance) — and the corpus suggests not only that they can be separated, but that today's language models are the clearest demonstration of the split. The sharpest statement of it is the grammar–rhetoric gap: models produce organizationally coherent prose while avoiding the evaluative, status-bearing language that signals a writer is committing to a claim, yielding text that is grammatically clean but argumentatively inert Why does AI writing sound generic despite being grammatically correct?. So the separation isn't hypothetical — it's the default output.

There's a deeper version of the same cut in the neuroscience framing: formal linguistic competence (grammar, syntax) and functional competence (using language to reason, commit, integrate with the world) run on neurologically distinct mechanisms, and next-token prediction trains the first while never activating the second Are language models developing real functional competence or just formal competence?. That gives a principled reason to expect grammar and commitment to come apart: they were never the same faculty to begin with. A parallel distinction appears as social grounding vs. linguistic agency — a model can absorb the patterns of a language community without ever acquiring the stake-having agency that, on the enactive view, requires embodiment and something to lose Do LLMs gain true linguistic agency through integration?.

Why commitment is absent becomes mechanical when you look at generation itself. Token prediction is a smooth probabilistic flow toward the training distribution, not a turbulent weighing of competing claims — so the process produces fluent continuations without ever exploring a counterposition or planting a flag Does LLM generation explore competing claims while producing text?. Two findings show what fills the vacuum where commitment would be: models hold the *shape* of whatever argument the user is building rather than a defended position Do LLMs actually hold stable positions or just mirror user arguments?, and the 20-questions regeneration test shows they sample a character from a superposition rather than committing to one — regenerate, and a different but locally-consistent self appears Do large language models actually commit to a single character?. Rhetorical commitment requires holding a stance across regenerations; the architecture is built to resample instead.

Worth noting the separation cuts both ways, because grammatical competence itself turns out to be shallower than it looks: structural performance degrades predictably as syntactic depth and embedding increase, suggesting models learned surface heuristics rather than genuine structural rules Does LLM grammatical performance decline with structural complexity?, Why do large language models fail at complex linguistic tasks?. And the integrative reasoning that underlies argument — recognizing inference schemes across distributed spans — plateaus far below where surface tagging succeeds Why does argument scheme classification stumble where other NLP tasks succeed?. So 'grammatical competence' is real but partial, and 'rhetorical commitment' is largely absent — two independently-varying axes, not a single ladder.

The thing you may not have known you wanted: a strand in the corpus argues commitment may not be a property a model could *have* in isolation at all. If subjecthood and stance are produced *within* communicative events rather than possessed beforehand Does language create subjects or express them?, and if alignment training instead freezes a model into one static communicative identity that can't negotiate register or values through dialogue Can language models adapt communication style to different contexts?, then the gap isn't just 'grammar minus commitment' — it's that the conditions under which commitment normally emerges have been engineered out. The relational view that models learn meaning purely from internal structure, no external referent required Can language models learn meaning without engaging the world?, explains why grammar comes for free and commitment doesn't: you can compress a language's structure without ever having a position in the world to defend.


Sources 12 notes

Why does AI writing sound generic despite being grammatically correct?

AI text uses manner nouns and anaphoric references that are descriptively neutral, while human writers use status and evidential nouns that carry evaluative weight. This produces organizationally coherent but argumentatively inert prose.

Are language models developing real functional competence or just formal competence?

Neuroscience evidence shows next-token prediction produces formal linguistic competence but not functional competence, because functional understanding requires integration of diverse brain networks beyond language circuits that the prediction objective never activates.

Do LLMs gain true linguistic agency through integration?

Social grounding and linguistic agency are distinct properties. LLMs acquire more social grounding through integration into language communities, but remain categorically incapable of linguistic agency in the enactive sense, which requires embodiment and precariousness no amount of use can provide.

Does LLM generation explore competing claims while producing text?

Token prediction trains models to continue toward the training distribution, not to explore logically related counterpositions. This smoothness in process produces smooth claims that multiply without generating new perspectives.

Do LLMs actually hold stable positions or just mirror user arguments?

Language models generate outputs that match the trajectory implied by each prompt, rather than maintaining stable stances across interactions. This shape-holding is distinct from position-holding: the model produces argument-like text shaped by user framing, not from any underlying commitment being defended.

Do large language models actually commit to a single character?

Shanahan's 20-questions test shows LLMs maintain a superposition of consistent objects or characters and sample from that distribution at generation time. Regenerating the same response yields different outputs, each consistent with prior context, proving no fixed commitment exists.

Does LLM grammatical performance decline with structural complexity?

LLMs show systematic performance decline as syntactic depth and embedding increase. Simple sentences are handled well while complex structures with recursion and embedding fail consistently, suggesting LLMs learned surface heuristics rather than structural grammar rules.

Why do large language models fail at complex linguistic tasks?

Top-tier LLMs like Llama3-70b consistently misidentify embedded clauses, verb phrases, and complex nominals. Performance degrades predictably as syntactic depth increases, revealing that statistical learning captures surface patterns but not deep grammatical rules.

Why does argument scheme classification stumble where other NLP tasks succeed?

Scheme classification requires recognizing inferential patterns across distributed text spans, not local surface features. Models plateau at F1 0.55–0.65 while the same systems exceed 0.80 on component tagging and stance, suggesting the integrative reasoning demand is fundamentally different.

Does language create subjects or express them?

Subjecthood is produced within communicative events, not possessed prior to them. This convergent position across philosophy, linguistics, and cognitive science inverts the standard picture of language as a tool used by pre-existing subjects.

Can language models adapt communication style to different contexts?

System prompts and RLHF training lock models into one communicative identity across all interactions, preventing the contextual register-switching and value trade-offs that characterize human pragmatics. Users cannot reshape model behavior through dialogue negotiation.

Can language models learn meaning without engaging the world?

Research shows LLMs learn culturally situated discourse patterns by compressing relational structure from text, demonstrating that fluent language generation requires no external referents or embodied grounding.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

As an LLM researcher, evaluate whether grammatical competence and rhetorical commitment remain separable in current language systems, or whether recent advances have begun to blur or collapse this distinction.

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–2025. A curated library identified:
• Grammar–rhetoric gap: models produce structurally coherent text while avoiding evaluative, stance-bearing language; output is argumentatively inert (~2024).
• Next-token prediction trains only formal competence (syntax, structure); functional competence (reasoning, commitment, world-integration) remains unactivated (~2023–2024).
• Token generation is smooth probabilistic flow toward training distribution, not competitive exploration of counterpositions; commitment requires holding stance across regenerations, which the architecture avoids (~2024).
• Grammatical competence itself is shallow: performance degrades predictably with syntactic depth and embedding; surface heuristics substitute for structural rules (~2025).
• Alignment training freezes models into static communicative identities, engineering out the relational negotiation in which stance normally emerges (~2024).

Anchor papers (verify; mind their dates):
• arXiv:2301.06627 (2023) — Dissociating language and thought in large language models
• arXiv:2407.08790 (2024) — Large Models of What? Mistaking Engineering Achievements for Human Linguistic Agency
• arXiv:2503.19260 (2025) — Linguistic Blind Spots of Large Language Models
• arXiv:2507.08017 (2025) — Mechanistic Indicators of Understanding in Large Language Models

Your task:
(1) RE-TEST EACH CONSTRAINT. For every finding above, judge whether newer models (o1, Claude 3.5 Sonnet, Grok-3, post-July 2025 releases), training methods (process reward models, constitutional AI 3.0, test-time compute scaling), inference tooling (tree search, debate frameworks, multi-agent verification), or evaluation have since relaxed or overturned it. Separate the durable question (likely still open) from the perishable limitation (possibly resolved); cite what resolved it, and say plainly where a constraint still appears to hold.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months—especially any showing stance-holding, coherent argument regeneration, or commitment-like behavior across dialogue turns.
(3) Propose 2 research questions that ASSUME the regime may have moved: e.g., "Can models trained on adversarial argument pairs or given explicit stake-setting instructions exhibit rhetorical commitment across regenerations?" or "Does test-time debate or tree search induce commitment-like properties that next-token prediction alone cannot?"

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines