SYNTHESIS NOTE

What combination of factors explains differences in LLM persuasiveness?

Why do some LLM persuasion studies show strong effects while others show none? This explores whether model choice, conversation design, and topic domain together predict when AI actually persuades.

Synthesis note · 2026-05-02 · sourced from Argumentation

When the Bilstein meta-analysis tested moderators individually, none reached significance — likely a power problem with only 7 studies. But the joint model combining LLM model family, conversation design (one-shot vs interactive multi-turn), and domain (health, political, etc.) explained R² = 81.93% of between-study variance and dropped residual heterogeneity from I² = 75.97% to I² = 35.51%. The conditional patterns reported, holding other factors constant: interactive multi-turn outperformed one-shot formats; GPT-4-based models outperformed Claude 3.x; health topics yielded stronger effects than political ones.

This is the operational corollary of Are language models actually more persuasive than humans?. The pooled-null result and the joint-moderator result are not in tension — they are two sides of the same finding. Average effect ≈ 0; conditional effect = whatever the model × design × domain combination dictates. The persuasive footprint is in the dial settings, not in the category.

The multi-turn-beats-one-shot finding reweights design priorities. It connects directly to Why do AI conversations reliably break down after multiple turns? as a topic area: persuasive influence accrues across turns, and conversational architecture is consequential for outcomes that one-shot generation cannot reach. This also intersects with Does AI persuasiveness fade across repeated conversations with the same person? in a productive tension. Bilstein finds interactive setups more persuasive than one-shot in pooled terms; Schoenegger finds persuasive advantage over humans waning across rounds. Both can be true: the multi-turn benefit is real but is a benefit shared with human persuaders, while the LLM-specific edge is concentrated at first contact.

The model-family signal (GPT-4 > Claude 3.x in this corpus) cautions against generalizing from any single model. Claims about "LLM persuasiveness" anchored to one architecture should be read as architecture-specific until replicated.

For writing about AI persuasion, the operational rule: don't quote a single-study effect size. Cite the meta-analytic null, then specify the dial settings under which a conditional effect appears.

Inquiring lines that read this note 19

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

How does rhetorical adaptation affect LLM persuasion and detectability?

What makes AI persuasion effective and how can we counter it?

Can prompting inject entirely new knowledge into language models?

How do prompt design and training choices shift persuasive outcomes measurably?

How do evaluation biases undermine LLM quality assessment systems?

Can LLM persuasion be fairly evaluated without stratifying by reader background?

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

12 direct connections · 92 in 2-hop network ·medium cluster Open in graph ↗

What combination of factors explains differences… Are language models actually more persuasive than … Where does AI's persuasive power actually come fro… Does AI persuasiveness fade across repeated conver…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Are language models actually more persuasive than humans? Does the research evidence support claims that LLMs persuade more effectively than humans, or have we been cherry-picking studies to fit a narrative?
pooled-null and joint-moderator are two sides of the same finding
Where does AI's persuasive power actually come from? Explores which techniques make AI most persuasive—and whether the usual suspects like personalization and model size are actually the main drivers. Matters because it reshapes where to focus AI safety concerns.
design dials documented at the training level appear at the meta-analytic level too
Does AI persuasiveness fade across repeated conversations with the same person? Does the persuasive edge LLMs show in initial encounters hold up over time? Understanding whether and why AI persuasion decays with exposure matters for assessing manipulation risk across different interaction lengths.
productive tension on multi-turn effects

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

combined moderators — model conversation design and domain — explain ~82% of between-study variance and interactive multi-turn beats one-shot

What combination of factors explains differences in LLM persuasiveness?

Inquiring lines that read this note 19

Related concepts in this collection 3

Related papers in this collection 8

Search by related questions 4