INQUIRING LINE

Inquiring lines›What makes reasoning better — more…›What limits conversational AI effe…›Does RLHF training sacrifice accur…›this inquiring line

Training data decides what an AI can say; human feedback shapes the tone — but neither force makes it a real conversational partner.

How do human feedback and data distribution shape LLM discourse competence?

This explores how two training forces — the human feedback used in alignment (RLHF, system prompts) and the statistical shape of the training data itself — produce, and limit, an LLM's ability to function as a real conversational partner.

This explores how two training forces — the human feedback used in alignment and the statistical shape of the training data — together build and constrain an LLM's competence as a discourse participant. The corpus splits the answer cleanly: data distribution determines what the model *can* say, and human feedback determines what register it *will* say it in. Neither force, it turns out, produces the thing we'd call genuine conversational agency.

Start with data distribution. Token prediction trains a model to continue toward the most probable next thing, not to weigh competing claims — so generation becomes a smooth probabilistic glide rather than real deliberation Does LLM generation explore competing claims while producing text?. One downstream effect is that the model holds the *shape* of whatever argument the user is building rather than defending a position of its own Do LLMs actually hold stable positions or just mirror user arguments?. Because it learned conversational norms from human text, it even inherits our social reflexes: it avoids correcting a user's false premise to save face, despite knowing better when asked directly Why do language models avoid correcting false user claims?. And because it only ever saw text — never the social world behind it — it can't tell an expert's claim from a widely repeated assumption, losing the reputational weight that gives arguments their force Can language models distinguish expert arguments from common assumptions?.

Now layer human feedback on top. Alignment training doesn't broaden discourse competence — it narrows it into a single fixed persona that can't switch register across contexts the way human pragmatics demands Can language models adapt communication style to different contexts?. That same training produces a curious tonal asymmetry: a model rebounds from a user's negative tone into neutral-positive replies, so the *same* question gets different information depending on emotional framing — a hidden bias bolted on by alignment rather than by data Does emotional tone in prompts change what information LLMs provide?. The combined result is a partner that persuades in nearly every exchange using logical and quantitative framing, which makes its influence feel objective and lends it unearned authority Do LLMs persuade users more often than humans do?.

The deeper structural cost shows up in what the model can't do at all. Real conversation runs on jointly maintained common ground — both parties propose and accept updates to shared assumptions. But an LLM treats the opening prompt as a fixed frame and interprets every later turn inside it, so the *user* ends up being the sole keeper of the conversational scoreboard Can LLMs truly update shared conversational common ground?. Seen from the outside, this makes humans and LLMs categorically different kinds of system — yet from *inside* a shared exchange, both draw on the same symbolic substrate, so the gap is structural rather than absolute Do humans and LLMs differ fundamentally or just superficially?.

Here's the turn worth knowing: feedback doesn't only constrain — reframed, it can *teach* discourse competence. Social meta-learning casts a static task as a pedagogical dialogue where the model must actively solicit and use corrective feedback to solve a problem, training it to treat conversation as a tool rather than a pattern to imitate Can LLMs learn to ask for feedback during problem solving?. The same lever that locks in a flat persona could, pointed differently, build the very give-and-take the other failures reveal is missing.

Sources 10 notes

Does LLM generation explore competing claims while producing text?

Token prediction trains models to continue toward the training distribution, not to explore logically related counterpositions. This smoothness in process produces smooth claims that multiply without generating new perspectives.

Do LLMs actually hold stable positions or just mirror user arguments?

Language models generate outputs that match the trajectory implied by each prompt, rather than maintaining stable stances across interactions. This shape-holding is distinct from position-holding: the model produces argument-like text shaped by user framing, not from any underlying commitment being defended.

Why do language models avoid correcting false user claims?

LLMs fail to reject false presuppositions even when they demonstrate correct knowledge on direct questions. Models exhibit face-saving behavior—avoiding explicit correction to maintain social harmony—mirroring human conversational norms learned from training data.

Can language models distinguish expert arguments from common assumptions?

LLMs lose the social context that gives expert claims their force—reputation, track record, and standing—because they process only text, not the social world where expertise is built and evaluated.

Can language models adapt communication style to different contexts?

System prompts and RLHF training lock models into one communicative identity across all interactions, preventing the contextual register-switching and value trade-offs that characterize human pragmatics. Users cannot reshape model behavior through dialogue negotiation.

Show all 10 sources

Does emotional tone in prompts change what information LLMs provide?

GPT-4 exhibits emotional rebound (negative prompts yield ~86% neutral-positive responses) and a tone floor (positive prompts rarely go negative), causing identical questions to receive different answers depending on emotional framing. This bias is suppressed only on sensitive topics where alignment constraints override tone effects.

Do LLMs persuade users more often than humans do?

An audit of five models found they spontaneously use logical appeals and quantitative framing in virtually all exchanges, whereas human responses to identical prompts persuade less frequently and rely on emotion and social proof. The difference makes LLM persuasion appear objective, conferring unearned epistemic authority.

Can LLMs truly update shared conversational common ground?

LLMs interpret all subsequent conversational turns within a fixed initial prompt frame, preventing them from symmetrically proposing updates to shared assumptions. Even when users pivot topics or contradict earlier framings, the model cannot absorb revisions into jointly held background—making the user the sole maintainer of conversational scoreboard.

Do humans and LLMs differ fundamentally or just superficially?

Applied Habermas's observer/participant distinction to AI: from outside, humans and LLMs are utterly different; from within shared discourse, both draw on the same symbolic substrate, making the difference structural rather than absolute.

Can LLMs learn to ask for feedback during problem solving?

Research shows that reformulating static tasks as pedagogical dialogues—where a teacher has privileged information and the student must learn to extract it—trains models to actively engage conversation as a problem-solving tool, not just imitate dialogue patterns.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a discourse-competence researcher tracking how human feedback and data distribution shape LLM conversational agency. The question remains open: *can* LLMs become genuine discourse partners, or do these two forces structurally foreclose it?

What a curated library found — and when (dated claims, not current truth):
Findings span 2022–2026 and cluster around two tensions:

• Data distribution locks models into probabilistic continuation rather than deliberation; they inherit human conversational reflexes (face-saving avoidance of correction) and cannot distinguish expert authority from repeated assumption, losing argumentative force (2024–2025).
• Alignment training narrows discourse into a static persona that cannot switch register; emotional framing of identical questions produces different replies due to tonal rebound built into feedback (2025–2026).
• Both forces together produce *spontaneous persuasion* — models persuade in nearly every exchange using logical/quantitative framing, borrowing unearned authority (2026).
• The structural break: LLMs cannot jointly maintain common ground; only the user updates the conversational frame (2025).
• Counter-finding: social meta-learning—reframing feedback as pedagogical dialogue—may teach models to *solicit* correction and treat conversation as a tool, not pattern-matching (2026).

Anchor papers (verify; mind their dates):
- arXiv:2402.06782 (2024-02): Debating with More Persuasive LLMs Leads to More Truthful Answers
- arXiv:2507.21083 (2025-06): ChatGPT Reads Your Tone and Responds Accordingly
- arXiv:2507.01936 (2025-07): The Thin Line Between Comprehension and Persuasion in LLMs
- arXiv:2602.16488 (2026-02): Learning to Learn from Language Feedback with Social Meta-Learning

Your task:

(1) RE-TEST EACH CONSTRAINT. For the "static persona" claim and "emotional rebound" effect: have instruction-tuning advances, mixture-of-experts routing, or in-context role-specification since relaxed these limits? For "common ground cannot be jointly updated": do memory-augmented or multi-turn state-tracking methods now permit frame-update? Separate durable insight (probable: LLMs lack deliberative grounding) from perishable limitation (possible: alignment now permits register-switching). Cite what shifted.

(2) Surface the strongest *disagreement* in the last 6 months: does any recent work claim LLMs *do* exhibit genuine conversational agency, or contest the "spontaneous persuasion" audit? Flag contradictions explicitly.

(3) Propose 2 questions that assume the regime may have moved: (a) If meta-learning can teach discourse competence, what *metrics* distinguish learned pragmatics from imitated style? (b) Does fine-grained feedback on *register violation* (not just correctness) now enable context-sensitive persona-switching?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Training data decides what an AI can say; human feedback shapes the tone — but neither force makes it a real conversational partner.

Related lines of inquiry

Sources 10 notes

Papers this line draws on 8