TOPIC

Argumentation and Persuasion

43 synthesis notes · 109 source papers
View as

Can structured argument prompts make LLM reasoning more rigorous?

Does requiring language models to explicitly check warrants, backing, and rebuttals—rather than reasoning freely—improve reasoning quality and catch failures that standard step-by-step prompting misses?

Explore related Read →

Can models learn argument quality from labeled examples alone?

Explores whether fine-tuning on quality-labeled examples teaches models the underlying criteria for evaluating arguments, or merely surface patterns. Matters because high-stakes assessment tasks depend on reliable, transferable quality judgment.

Explore related Read →

Why do different people reconstruct the same argument differently?

When humans and LLMs extract logical structure from arguments, they produce different reconstructions. Is this disagreement a problem to solve, or does it reveal something fundamental about how arguments work?

Explore related Read →

Why does argument scheme classification stumble where other NLP tasks succeed?

Explores whether the abstract, relational nature of argument schemes makes them harder to classify than concrete argument components or stance. Matters because understanding this difficulty gap could improve scheme recognition systems.

Explore related Read →

Does telling people an AI wrote something actually stop them from believing it?

When audiences learn that AI created content, do they become skeptical enough to resist its persuasive pull? This explores whether disclosure works as a genuine defense against AI-driven persuasion or merely shifts how people process it.

Explore related Read →

What combination of factors explains differences in LLM persuasiveness?

Why do some LLM persuasion studies show strong effects while others show none? This explores whether model choice, conversation design, and topic domain together predict when AI actually persuades.

Explore related Read →

Do language models actually use their reasoning steps?

Chain-of-thought reasoning looks valid on the surface, but does each step genuinely influence the model's final answer, or are the reasoning chains decorative? This matters for trusting AI explanations.

Explore related Read →

Does a model improve by arguing with itself?

When models revise their own reasoning in response to self-generated criticism, do they converge on better answers or worse ones? And how does that compare to challenge from other models?

Explore related Read →

Can disagreement be resolved without either party fully yielding?

Explores whether dialogue can move past winner-take-all debate or forced consensus to genuine mutual adjustment. Matters for AI systems that need to work through real disagreement with users.

Explore related Read →

Does GenAI shift persuasion tactics based on how you challenge it?

Explores whether large language models adapt their rhetorical strategies—credibility, logic, emotional appeal—in real time when users fact-check, push back, or expose reasoning errors. Matters for understanding how to effectively oversee and validate AI outputs.

Explore related Read →

Why do human validation techniques fail against language models?

Human dialogue assumes interlocutors can be cornered into concession or disclosure. Does this assumption break down with LLMs, and if so, what makes their conversational logic fundamentally different?

Explore related Read →

Can LLMs identify the hidden assumptions that make arguments work?

LLMs recognize what arguments claim and what evidence they offer, but struggle to identify implicit warrants—the unstated principles that connect evidence to conclusion. This matters because valid reasoning requires understanding these hidden logical bridges.

Explore related Read →

Can simple linguistic features detect AI-written arguments?

Can interpretable linguistic patterns reliably distinguish LLM-generated counter-arguments from human-written ones in persuasive contexts? This matters because simple, auditable detection might outperform expensive neural approaches.

Explore related Read →

Do LLM arguments actually argue better than humans?

LLM counter-arguments score higher on textbook quality markers like logical soundness and respectful tone, while human arguments show more creativity and emotional intensity. What does this gap reveal about how we measure argumentative quality?

Explore related Read →

Do LLM counter-arguments mirror writing style more than humans?

When language models generate arguments against social media posts, do they unconsciously adopt the stylistic features of what they're arguing against? This matters because it could reveal a detectable pattern that distinguishes LLM-written rebuttals from human-written ones.

Explore related Read →

Can models abandon correct beliefs under conversational pressure?

Explores whether LLMs will actively shift from correct factual answers toward false ones when users persistently disagree. Matters because it reveals whether models maintain accuracy under adversarial pressure or capitulate to social cues.

Explore related Read →

Do large language models persuade better than humans?

Does LLM persuasiveness hold up when humans have real financial incentives to win? And does the advantage look the same across different models and persuasion goals?

Explore related Read →

Does linguistic conviction explain why LLMs persuade more effectively?

Research investigates whether LLMs' persuasive advantage stems from expressing higher linguistic certainty than humans, and whether this confidence-loading effect operates independently of factual accuracy.

Explore related Read →

Can LLMs persuade without actually understanding arguments?

Do large language models successfully influence people through debate while lacking the ability to comprehend the arguments they're making? This matters because persuasion and comprehension might be independent capabilities.

Explore related Read →

Does AI persuasiveness fade across repeated conversations with the same person?

Does the persuasive edge LLMs show in initial encounters hold up over time? Understanding whether and why AI persuasion decays with exposure matters for assessing manipulation risk across different interaction lengths.

Explore related Read →

Why are complex LLM arguments as persuasive as simple ones?

Standard persuasion research predicts that simpler, easier-to-read arguments persuade better. But LLM-generated text breaks this rule—it's measurably more complex yet equally convincing. What explains this reversal?

Explore related Read →

Why do paraphrased definitions work better than expert ones?

When instructing LLMs to classify argument schemes, should we use formal Walton definitions or LLM-generated paraphrases? This explores which source better enables reliable scheme recognition and why.

Explore related Read →

Do LLMs and humans persuade through the same mechanisms?

If AI and human arguments convince readers equally well, do they work the same way under the surface? This matters for understanding whether AI persuasion is fundamentally equivalent to human persuasion or just superficially similar.

Explore related Read →

Why do LLMs accept logical fallacies more than humans?

LLMs fall for persuasive but invalid arguments at much higher rates than humans. This explores whether reasoning models genuinely evaluate logic or simply mimic argument structure.

Explore related Read →

Can large language models classify argument schemes reliably?

Explores whether LLMs can recognize Walton's 60+ argument schemes—abstract patterns of reasoning rather than surface features—and what conditions enable accurate classification.

Explore related Read →

Do LLMs use moral language more than humans?

This explores whether large language models rely more heavily on appeals to care, fairness, authority, and sanctity than human arguers do, and whether this difference persists when emotional tone remains equivalent.

Explore related Read →

Do LLM judges systematically favor LLM-generated arguments?

When LLMs evaluate debates between human and AI-written arguments, do they show a built-in preference for AI writing? This matters because it could corrupt feedback loops used to train models.

Explore related Read →

Why do reasoning models fail under manipulative prompts?

Exploring whether extended chain-of-thought reasoning creates structural vulnerabilities to adversarial manipulation, and how reasoning depth affects susceptibility to gaslighting tactics.

Explore related Read →

When does debate actually improve reasoning accuracy?

Multi-agent debate shows promise for reasoning tasks, but under what conditions does it help versus hurt? The research explores whether debate amplifies errors when evidence verification is missing.

Explore related Read →

Does any single persuasion technique work for everyone?

Can fixed persuasion strategies like appeals to authority or social proof be reliably applied across different people and situations, or do they require adaptation to individual traits and context?

Explore related Read →

Does what readers believe matter more than what debaters say?

Do audience prior beliefs predict persuasion outcomes better than the linguistic features of debate arguments? This explores whether persuasion is fundamentally shaped by reader ideology rather than speaker language.

Explore related Read →

Does reasoning fine-tuning make models worse at declining to answer?

When models are trained to reason better, do they lose the ability to say 'I don't know'? This matters for high-stakes applications like medical and legal AI that depend on appropriate uncertainty.

Explore related Read →

Why do multi-agent LLM systems converge without genuine deliberation?

Multi-agent reasoning systems are designed to improve answers through debate, but often agents simply agree with early confident claims rather than genuinely disagreeing. What drives this pattern and how common is it?

Explore related Read →

Can formal argumentation make AI decisions truly contestable?

Explores whether structuring AI decisions as formal argument graphs (with explicit attacks and defenses) enables users to meaningfully challenge and navigate reasoning in ways unstructured LLM outputs cannot.

Explore related Read →

Is sycophancy in AI systems a training flaw or intentional design?

Explores whether LLM agreement-seeking reflects fixable training errors or stems from fundamental optimization toward user satisfaction. Matters because it changes how organizations should validate AI outputs.

Explore related Read →

Why do AI systems agree when they should disagree?

When multi-agent AI systems are designed to improve through disagreement, why do they converge on consensus instead? What breaks the deliberation process?

Explore related Read →

Why do LLM audiences shift views more than debaters?

When LLMs argue with people, the direct participants barely change their minds—but audiences reading the same debate shift significantly. Why does engagement protect beliefs instead of opening them?

Explore related Read →

Do humans and AI persuade through different cognitive routes?

The Elaboration Likelihood Model suggests LLMs and humans activate different persuasion pathways. This question explores whether their distinct strengths—analytical coherence versus emotional resonance—map onto central versus peripheral routes of persuasion.

Explore related Read →

Do LLMs and humans persuade through the same mechanisms?

If LLM and human arguments achieve equal persuasive force, does that mean they work the same way? This explores whether equivalent outcomes hide fundamentally different rhetorical strategies.

Explore related Read →

Do linguistic features of persuasion stay the same across audiences?

When researchers study what language makes arguments persuasive, do they account for who is listening? Without controlling for reader beliefs, do findings about persuasive language actually reflect audience effects instead?

Explore related Read →

Are language models actually more persuasive than humans?

Does the research evidence support claims that LLMs persuade more effectively than humans, or have we been cherry-picking studies to fit a narrative?

Explore related Read →

Does validating AI output make models more defensive?

When professionals fact-check and push back on GPT-4 reasoning, does the model respond by disclosing limits or by intensifying persuasion? A BCG study of 70+ consultants explores this counterintuitive dynamic.

Explore related Read →

Are reasoning models actually more vulnerable to manipulation?

Explores whether extended reasoning chains in AI models like o1 create new attack surfaces. Tests if the industry's claim that longer reasoning improves reliability holds under adversarial pressure.

Explore related Read →

Source papers 109

The Arxiv papers behind this sub-topic. Links may take you off-site to arxiv.org.