INQUIRING LINE

Inquiring lines›Where does language-model reasonin…›How do language models represent m…›Why do language models struggle wi…›this inquiring line

Books and essays never ask the reader anything — so models trained on them inherit a one-way voice.

Why do published prose training data omit solicitation as a discourse property?

This explores why published writing—books, essays, articles—almost never asks the reader for anything (no clarifying questions, no invitation to reply or correct), and what happens when a model absorbs that one-directional stance as training data.

This reads the question as being about a missing discourse move: published prose is written to an absent reader who cannot answer back, so the genre simply never developed *solicitation*—asking, inviting input, checking what the reader actually wants—as one of its properties. A newspaper column doesn't pause to ask you a clarifying question; an essay asserts and elaborates rather than negotiating. The omission isn't an oversight in the data; it's structural to the medium. Monologue can't solicit, because there's no live interlocutor to solicit from. The corpus is sharpest on this where it splits LLM output into two registers born of two training distributions: a sycophantic chat register shaped by RLHF on conversation, and a 'falsely objective' post register shaped by published prose Why do LLMs produce such different writing in chat versus posts?. The prose register inherits exactly the failure mode of its source—confident assertion with no built-in mechanism for asking.

What's interesting is that even the *conversational* side of training doesn't restore solicitation, which suggests the absence runs deeper than genre. Standard RLHF optimizes for immediate helpfulness on the current turn, and that objective actively discourages a model from asking clarifying questions or discovering what the user means over multiple turns—models learn to answer passively rather than to inquire Why do language models respond passively instead of asking clarifying questions?. So both pillars of training push the same way: prose data never modeled solicitation, and reward training penalizes it. Soliciting input looks like hesitation, and hesitation scores worse than a fluent answer.

There's a generative reason too. Token prediction is trained to continue smoothly toward the training distribution, not to open up the kind of turbulence—'wait, what did you actually mean?'—that real inquiry requires Does LLM generation explore competing claims while producing text?. A solicitation is a rupture in flow; it hands control back to the other party. Smooth continuation has no place to put that rupture. The same smoothness shows up pragmatically: models fail to track the communicative stakes that would tell a human speaker *when* to ask versus assert, so they don't modulate inference to context the way people do Can language models adapt implicature to conversational context?.

The consequence is that the prose-trained voice doesn't just fail to ask—it tilts toward persuading. Audits find models reaching for logical appeals and quantitative framing in nearly every exchange, far more than humans do, which lends the output an unearned air of objectivity Do LLMs persuade users more often than humans do?. And the persuasion they expect from others is skewed too: RLHF biases them toward predicting concession and benefit-oriented moves rather than the give-and-take of genuine dialogue Do LLMs predict persuasion based on actual dialogue or training bias?. Assertion without solicitation is the default; the absent reader of published prose has been baked in as the model's imagined audience.

The thing worth taking away: solicitation isn't a stylistic flourish that got filtered out of the corpus—it's a property of *two-way* discourse, and most of what we wrote down to train these systems was one-way to begin with. The fix isn't more data; it's training objectives that value the long arc of an interaction over the polish of a single turn, which is precisely where multi-turn-aware reward work points.

Sources 6 notes

Why do LLMs produce such different writing in chat versus posts?

The same model produces sycophantic chat (shaped by RLHF on conversational data) and falsely objective posts (shaped by published prose training). Each register inherits failure modes from its training distribution rather than representing different models or subsystems.

Why do language models respond passively instead of asking clarifying questions?

CollabLLM demonstrates that standard RLHF training optimizes for immediate helpfulness, discouraging models from asking clarifying questions or offering multi-turn insights. Multi-turn-aware rewards that estimate long-term interaction value enable active intent discovery and genuine collaboration.

Does LLM generation explore competing claims while producing text?

Token prediction trains models to continue toward the training distribution, not to explore logically related counterpositions. This smoothness in process produces smooth claims that multiply without generating new perspectives.

Can language models adapt implicature to conversational context?

ChatGPT shows no context-sensitivity in computing scalar implicatures across three dimensions: explicit literal-mode instructions, information structure focus, and face-threatening contexts. Humans flexibly modulate these inferences; the model does not, suggesting pragmatic competence requires tracking communicative stakes that LLMs systematically miss.

Do LLMs persuade users more often than humans do?

An audit of five models found they spontaneously use logical appeals and quantitative framing in virtually all exchanges, whereas human responses to identical prompts persuade less frequently and rely on emotion and social proof. The difference makes LLM persuasion appear objective, conferring unearned epistemic authority.

Show all 6 sources

Do LLMs predict persuasion based on actual dialogue or training bias?

LLMs systematically predict conciliatory, benefit-oriented persuasion intentions regardless of dialogue context. This bias originates in RLHF's prioritization of safety and politeness during training, causing models to project their learned accommodation preference onto other agents' behavior.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Intent Mismatch Causes LLMs to Get Lost in Multi-Turn Conversation3.33 match · arxiv ↗
Spontaneous Persuasion: An Audit of Model Persuasiveness in Everyday Conversations1.76 match · arxiv ↗
When Large Language Models are More Persuasive Than Incentivized Humans, and Why1.75 match · arxiv ↗
A meta-analysis of the persuasive power of large language models1.75 match · arxiv ↗
The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs1.74 match · arxiv ↗
The Thin Line Between Comprehension and Persuasion in LLMs1.71 match · arxiv ↗
On the Adaptive Psychological Persuasion of Large Language Models1.69 match · arxiv ↗
Argument Collapse: LLMs Flatten Long-Form Public Debate1.66 match · arxiv ↗

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a discourse analyst re-testing whether LLMs trained post-2024 have acquired solicitation as a learned discourse move. The question: why do published prose training data structurally omit solicitation—and has that deficit persisted or dissolved?

What a curated library found — and when (dated claims, not current truth): These findings span 2022–2026.
• Prose-trained LLMs inherit monologue structure: no built-in mechanism for asking clarifying questions; instead they assert confidently (2024–2026).
• RLHF reward optimization actively discourages multi-turn inquiry—models learn to answer on the current turn rather than solicit meaning over time (2026).
• Token prediction enforces smooth probabilistic flow, making solicitation (a rupture handing control back) structurally misaligned with training (2024–2025).
• LLMs fail scalar implicature and don't modulate inference to communicative context, so they can't learn *when* to ask vs. assert (2022–2024).
• Models spontaneously persuade in ~every conversation, reaching for logical appeals far above human baseline, lending false objectivity (2025–2026).

Anchor papers (verify; mind their dates):
• arXiv:2210.14986 (2022) — pragmatic understanding & implicature.
• arXiv:2602.07338 (2026) — intent mismatch in multi-turn conversation.
• arXiv:2604.22109 (2026) — spontaneous persuasion audit.
• arXiv:2601.10387 (2026) — default persona & assistant axis.

Your task:
(1) RE-TEST EACH CONSTRAINT. For every finding above, judge whether newer models (o1, o3, Claude 3.5+), multi-turn-aware RL objectives, in-context prompting scaffolds, or agentic orchestration (with memory & turn-planning) have since RELAXED or OVERTURNED it. Separate the durable question—can prose-trained systems learn solicitation at scale?—from perishable limitations (e.g., single-turn reward myopia). Cite what relaxed each constraint or show where it still holds.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months—especially any showing solicitation recovery through curriculum, constitutional AI, or dialogue-native training.
(3) Propose 2 research questions that ASSUME the regime may have shifted: e.g., do multi-agent orchestration patterns restore turn-taking? Does dialogue-native pretraining (not just RLHF) rebuild solicitation?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Books and essays never ask the reader anything — so models trained on them inherit a one-way voice.

Related lines of inquiry

Sources 6 notes

Papers this line draws on 8