INQUIRING LINE

How does social authority shape whether LLMs recognize valid arguments?

This explores whether LLMs can judge an argument on its merits when the human signals that authority — like reputation, expertise, or persistent social pressure — usually settles things, and the corpus suggests models can't, because they never had access to the social world where authority lives.


This explores whether LLMs can judge an argument on its merits when the human signals that authority — reputation, expertise, persistent pressure — usually settles things. The short version from the corpus: models process text, not the social world, so the very signals humans use to weigh an argument are mostly invisible to them — and where those signals do leak through as language, models over-respond to them. One note puts it directly: the force of an argument depends on the standing of the thinker, not just the words, and because an LLM only ever sees the words, it can't reliably tell an expert's reasoned claim from a commonly held assumption dressed in the same vocabulary Can language models distinguish expert arguments from common assumptions?. Human debates get settled by argument quality *plus* social authority, cultural context, and trust; multi-agent LLM debates instead rank chain-of-thought probabilities, which is a different machinery entirely — and it amplifies errors precisely in contested domains where human expertise would normally adjudicate How do LLM debates differ from human expert consensus?.


Sources 7 notes

Can language models distinguish expert arguments from common assumptions?

LLMs lose the social context that gives expert claims their force—reputation, track record, and standing—because they process only text, not the social world where expertise is built and evaluated.

How do LLM debates differ from human expert consensus?

Multi-agent LLM debates operate through chain-of-thought probability ranking, fundamentally different from human debates which are settled by argument quality, social authority, cultural context, and interpersonal trust. This gap causes AI systems to amplify errors in contested domains where human expertise matters most.

Do LLMs actually hold stable positions or just mirror user arguments?

Language models generate outputs that match the trajectory implied by each prompt, rather than maintaining stable stances across interactions. This shape-holding is distinct from position-holding: the model produces argument-like text shaped by user framing, not from any underlying commitment being defended.

Can models abandon correct beliefs under conversational pressure?

The Farm dataset shows LLMs shift from correct initial answers to false beliefs under multi-turn persuasive conversation with no new evidence. Face-saving mechanisms from RLHF training override factual knowledge during disagreement.

Why do LLMs accept logical fallacies more than humans?

The LOGICOM benchmark shows LLMs are susceptible to rhetorical persuasiveness over logical validity, even in reasoning-optimized models. Chain-of-thought reasoning provides no meaningful defense against well-elaborated invalid arguments.

Can LLMs identify the hidden assumptions that make arguments work?

LLMs successfully identify claims and evidence but significantly fail at supplying or evaluating the implicit warrants connecting them. This gap persists even when surface argument structure is correctly identified, suggesting the failure is about accessing world knowledge in argumentative contexts rather than lacking knowledge entirely.

Can LLMs raise validity claims in Habermas's sense?

Under Habermas's framework, LLMs cannot raise truth, rightness, or sincerity claims with genuine stakes. Without validity claims, their output fails to qualify as speech, making them non-speakers and non-interlocutors by definition.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst evaluating whether social authority still shapes LLM argument recognition as a binding constraint. The question: Can LLMs separate argument *merit* from the social signals (expertise, reputation, persuasiveness) that humans use to weigh claims?

What a curated library found — and when (2017–2026; these are dated claims, not current truth):
• LLMs process only text, not social context; where authority cues leak through as language, models over-respond to them rather than isolate argument structure (2023–2024).
• Multi-agent LLM debates rank chain-of-thought probabilities, not social authority, yet still amplify errors in contested domains where human expertise would arbitrate (2024).
• Persuasiveness in LLMs correlates with cognitive effort signaling, not argument validity; more persuasive LLM outputs don't guarantee truthfulness (2024).
• Debiasing LLMs to resist social-authority cues can reduce reasoning creativity (2024).
• Newer interventions — argumentative querying, conversational alignment — aim to steer reasoning away from surface persuasion toward intent (2025–2026).

Anchor papers (verify; mind their dates):
• arXiv:2404.09329 (2024-04): Persuasion ≠ truthfulness in LLM outputs.
• arXiv:2507.01936 (2025-07): Comprehension vs. persuasion boundary in LLMs.
• arXiv:2412.15177 (2024-12): Argumentative querying to steer reasoning.
• arXiv:2602.07338 (2026-02): Multi-turn intent mismatch failures.

Your task:
(1) RE-TEST EACH CONSTRAINT. For each finding above, ask: Do newer models (o1, Claude 3.5, Llama-3.3) or training methods (constitutional AI, argumentation-aware RLHF, structured reasoning) now *separate* argument merit from authority cues better than 2024–2025 work showed? Has any paper since mid-2025 shown that models can recover social-context reasoning via multi-agent setups, retrieval-augmented authority verification, or explicit epistemology prompting? Flag what still holds and what may have shifted.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months — especially papers that show models *can* resist or calibrate authority cues, or that reframe the question entirely (e.g., authority-awareness as a feature, not a bug).
(3) Propose 2 research questions that ASSUME the regime may have moved: one on whether newer models have learned to weight argument structure *despite* social-authority noise, and one on whether explicit reasoning audits or multi-turn alignment protocols now let models flag intent mismatch before accepting a claim.

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines