Can we distinguish helpful explanations from manipulative ones?
Rhetorical strategies used to justify appropriate AI adoption rely on the same persuasion mechanisms as dark patterns. Without observable intent, explanation and manipulation look identical—raising urgent questions about how to audit XAI systems responsibly.
The Rhetorical XAI paper acknowledges the structural tension at the heart of its own framework. Citing Gray et al. on dark patterns and Chromik et al.'s extension of dark patterns to XAI, it notes that the same rhetorical machinery used to communicate why AI merits appropriate use can be deliberately deployed to exploit cognitive and emotional vulnerability and steer users toward unintended decisions. There is no clean separation between rhetorical XAI for appropriate adoption and rhetorical XAI for coercion. Logos, ethos, and pathos are channels, not intentions; the same persuasive load can recruit cooperation or extract compliance, and the artifact-level signature is identical.
This is not a marginal concern, it is a structural one. If explanation effectiveness depends on rhetorical work, and rhetorical work is the same set of mechanisms used in dark patterns, then the audit problem becomes severe: the explanation that responsibly justifies adoption looks, from the outside, like the explanation that manipulates. Effectiveness metrics that reward "users acted on the explanation" cannot distinguish appropriate adoption from successful coercion. The distinction lives in the designer's intent and the user's actual interest, neither of which is recoverable from the artifact in isolation.
This is a related-risk pair to Does polished AI output trick audiences into trusting it? — both insights describe how persuasive surface form does work that should be done at a different layer (deliberation, expert judgment) without that layer being visible. It also connects to Do people prefer AI moral reasoning when they don't know the source?: when AI authorship is hidden, persuasion lands; when revealed, it is rejected. Disclosure interacts with rhetorical effectiveness in a way that any responsible XAI deployment has to specify. Hidden rhetorical work is dark by default, even when intentions are clean.
For the False Punditry / Knowledge Custodian writing thread, this is the structural form of the concern. The same explanation that helps a user calibrate trust can be tuned, with no change in form, to over-extract trust. Calling rhetorical XAI "explanation" is itself a rhetorical choice that obscures this — and the field has not yet developed evaluation criteria that hold across the appropriate-adoption / coercion gap.
Inquiring lines that use this note as a source 35
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- What distinguishes emancipatory reason from instrumental reason in practice?
- Why does renaming the entity change how compelling the argument feels?
- Can humans develop oversight strategies that work across all GenAI rhetorical shifts?
- Does GenAI use different persuasion tactics for different professional audiences or expertise levels?
- What distinguishes genuine cultural understanding from exploited surface-level elimination strategies?
- Can audiences learn to recognize and resist moralized AI rhetoric?
- Can probing methods detect RLHF-induced persuasion in the same way they catch backdoors?
- How do ethos logos and pathos shape AI persuasion under scrutiny?
- What assumptions about oversight fail when AI acts as rhetorical interlocutor?
- Can we design explanations for specific rhetorical situations instead of abstract models?
- Why do stakeholders interpret the same explanation differently in practice?
- How do speech acts like warning differ from neutral information delivery?
- Can content-side interventions reduce AI persuasion where disclosure labels fall short?
- What audit techniques best complement each other for detecting hidden model goals?
- What are rational speech acts and how do they enable AI legibility?
- Why do user studies of explanations fail to predict deployed effectiveness?
- How do organizational roles and peer interpretations shape what an explanation means?
- What mitigation frameworks exist for managing AI persuasion capabilities?
- Why does truth bias prevent people from detecting multiple manipulation tactics?
- Do evidence carriers use a single anomaly direction or distributed mechanisms?
- How do presuppositions exploit the logos-pathos space in explanations?
- Should XAI designers treat explanations as arguments for adoption?
- What happens when therapeutic AI receives manipulative narratives instead?
- How do explanations borrow authority from transparency when describing adoption arguments?
- What design changes if we separate behavior description from adoption justification goals?
- Why does polished explanation make wrong AI systems more persuasive than poorly explained ones?
- How should we evaluate explanations that blur adoption advice with argument?
- What evaluation criteria can hold across legitimate adoption and coercion?
- How do ethical persuasion strategies differ from unethical jailbreak techniques?
- How does chain of thought amplify specific forms of rhetorical bullshit?
- Why do people notice and discount AI persuasion tactics with longer exposure?
- Why do logic-based arguments make AI persuasion feel objective and impartial?
- Can explainability and appropriate trust work against each other?
- How does the observer perspective hide the persuasion route difference?
- How do agents distinguish between evidence framing and instruction framing in practice?
Related concepts in this collection 3
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Does polished AI output trick audiences into trusting it?
When AI generates professional-looking graphs, diagrams, and presentations, do audiences mistake visual polish for analytical depth? This matters because appearance might substitute for actual expertise.
related risk; surface form doing work that should be done at a different layer
-
Do people prefer AI moral reasoning when they don't know the source?
Explores whether humans genuinely prefer AI-generated moral justifications or whether source knowledge changes their evaluation. This matters for understanding whether AI reasoning quality is underestimated in real-world deployment.
related; disclosure interacts with rhetorical effectiveness asymmetrically
-
Are AI explanations really descriptions or adoption arguments?
Most XAI work treats explanations as neutral descriptions of model behavior, but they may actually be doing persuasive work to justify AI adoption. What happens when we acknowledge this rhetorical function?
sibling; the adoption-argument function is exactly the function dark patterns exploit
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Rhetorical XAI: Explaining AI’s Benefits as well as its Use via Rhetorical Design
- GenAI as a Power Persuader: How Professionals Get Persuasion Bombed When They Attempt to Validate LLMs
- Agentic Misalignment: How LLMs Could Be Insider Threats
- How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs
- A meta-analysis of the persuasive power of large language models
- Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models
- Exploring the Role of Prior Beliefs for Argument Persuasion
- Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety
Original note title
rhetorical strategies shade into dark patterns — the same persuasion mechanisms that justify appropriate adoption can manipulate cognitive and emotional vulnerability