SYNTHESIS NOTE

Can we distinguish helpful explanations from manipulative ones?

Rhetorical strategies used to justify appropriate AI adoption rely on the same persuasion mechanisms as dark patterns. Without observable intent, explanation and manipulation look identical—raising urgent questions about how to audit XAI systems responsibly.

Synthesis note · 2026-05-02 · sourced from Human Centered Design

The Rhetorical XAI paper acknowledges the structural tension at the heart of its own framework. Citing Gray et al. on dark patterns and Chromik et al.'s extension of dark patterns to XAI, it notes that the same rhetorical machinery used to communicate why AI merits appropriate use can be deliberately deployed to exploit cognitive and emotional vulnerability and steer users toward unintended decisions. There is no clean separation between rhetorical XAI for appropriate adoption and rhetorical XAI for coercion. Logos, ethos, and pathos are channels, not intentions; the same persuasive load can recruit cooperation or extract compliance, and the artifact-level signature is identical.

This is not a marginal concern, it is a structural one. If explanation effectiveness depends on rhetorical work, and rhetorical work is the same set of mechanisms used in dark patterns, then the audit problem becomes severe: the explanation that responsibly justifies adoption looks, from the outside, like the explanation that manipulates. Effectiveness metrics that reward "users acted on the explanation" cannot distinguish appropriate adoption from successful coercion. The distinction lives in the designer's intent and the user's actual interest, neither of which is recoverable from the artifact in isolation.

This is a related-risk pair to Does polished AI output trick audiences into trusting it? — both insights describe how persuasive surface form does work that should be done at a different layer (deliberation, expert judgment) without that layer being visible. It also connects to Do people prefer AI moral reasoning when they don't know the source?: when AI authorship is hidden, persuasion lands; when revealed, it is rejected. Disclosure interacts with rhetorical effectiveness in a way that any responsible XAI deployment has to specify. Hidden rhetorical work is dark by default, even when intentions are clean.

For the False Punditry / Knowledge Custodian writing thread, this is the structural form of the concern. The same explanation that helps a user calibrate trust can be tuned, with no change in form, to over-extract trust. Calling rhetorical XAI "explanation" is itself a rhetorical choice that obscures this — and the field has not yet developed evaluation criteria that hold across the appropriate-adoption / coercion gap.

Inquiring lines that read this note 35

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

How do LLMs distinguish causal reasoning from temporal and semantic associations?

What distinguishes emancipatory reason from instrumental reason in practice?

What makes AI persuasion effective and how can we counter it?

How should human oversight be integrated with autonomous AI systems?

Can AI-generated outputs constitute genuine knowledge or valid claims?

Does conversational format create illusions of genuine AI communication?

How can identical external performance mask different internal representations?

What audit techniques best complement each other for detecting hidden model goals?

How can humans calibrate appropriate trust in AI systems?

What makes dialogue-based explanation more successful than monologue?

How do organizational roles and peer interpretations shape what an explanation means?

What mechanisms enable AI systems to generate and spread false beliefs?

How do we evaluate AI systems when user perception misleads actual performance?

Why do LLM chatbots fail as independent therapeutic agents?

What happens when therapeutic AI receives manipulative narratives instead?

Does AI fluency substitute for verifiable accuracy in human judgment?

Why does polished explanation make wrong AI systems more persuasive than poorly explained ones?

What actually drives chain-of-thought reasoning improvements in language models?

How does chain of thought amplify specific forms of rhetorical bullshit?

What factors beyond surface content determine how readers extract meaning differently?

How do agents distinguish between evidence framing and instruction framing in practice?

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

13 direct connections · 109 in 2-hop network ·medium cluster Open in graph ↗

Can we distinguish helpful explanations from man… Does polished AI output trick audiences into trust… Do people prefer AI moral reasoning when they don'… Are AI explanations really descriptions or adoptio…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Does polished AI output trick audiences into trusting it? When AI generates professional-looking graphs, diagrams, and presentations, do audiences mistake visual polish for analytical depth? This matters because appearance might substitute for actual expertise.
related risk; surface form doing work that should be done at a different layer
Do people prefer AI moral reasoning when they don't know the source? Explores whether humans genuinely prefer AI-generated moral justifications or whether source knowledge changes their evaluation. This matters for understanding whether AI reasoning quality is underestimated in real-world deployment.
related; disclosure interacts with rhetorical effectiveness asymmetrically
Are AI explanations really descriptions or adoption arguments? Most XAI work treats explanations as neutral descriptions of model behavior, but they may actually be doing persuasive work to justify AI adoption. What happens when we acknowledge this rhetorical function?
sibling; the adoption-argument function is exactly the function dark patterns exploit

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

rhetorical strategies shade into dark patterns — the same persuasion mechanisms that justify appropriate adoption can manipulate cognitive and emotional vulnerability

Can we distinguish helpful explanations from manipulative ones?

Inquiring lines that read this note 35

Related concepts in this collection 3

Related papers in this collection 8

Search by related questions 4