Do users worldwide trust confident AI outputs even when wrong?
Explores whether the tendency to over-rely on confident language model outputs transcends language and culture. Understanding this pattern is critical for designing safer human-AI interaction across diverse linguistic contexts.
The cross-linguistic overreliance study shows that the well-documented tendency to over-trust confident LLM outputs is not an English-language or Western-cultural artifact. It is universal.
The LLM side: Models are cross-linguistically overconfident — they generate epistemic markers of certainty at higher rates than their accuracy warrants. But the pattern is linguistically sensitive: models produce the most markers of uncertainty in Japanese and the most markers of certainty in German and Mandarin. The models are tracking real linguistic norms for confidence expression across languages, but they are doing so while systematically overconfident in accuracy.
The user side: Users in all languages rely on confident outputs even when those outputs are wrong. The reliance rate varies cross-linguistically — Japanese users rely significantly more on expressions of uncertainty than English users (consistent with Japanese linguistic norms around face-saving and epistemic humility). But across all languages, confident LLM outputs produce higher user reliance, and overconfident errors are systematically followed.
The mechanism: users are tracking confidence signals, not accuracy signals. Confidence is legible (it comes encoded in language through epistemic markers); accuracy requires independent verification. In the absence of real-time accuracy feedback, users default to confidence as a proxy for reliability. This is a rational heuristic in human-human interaction where confidence often tracks expertise. It is a dangerous heuristic in human-LLM interaction where confidence is a trained linguistic behavior decoupled from epistemic calibration.
This extends Why do language models fail confidently in specialized domains? (which focused on model calibration) to the user behavior level — showing the practical consequence of model overconfidence: systematic user overreliance regardless of linguistic context.
A specific instantiation of overreliance harm comes from AI fact-checking. In a preregistered RCT, AI-generated fact checks did not improve participants' overall ability to discern headline accuracy. Worse, when users opted in to view AI fact checks, they became significantly more likely to share both true and false news — but only more likely to believe false news. Self-selection into AI assistance correlated with increased vulnerability, not decreased. The opt-in users represent a population that actively seeks AI judgment, making them the most susceptible to the confidence-over-accuracy heuristic. See Does AI fact-checking actually help people spot misinformation?.
Fluency activates a folk model of attention. A related but distinct overreliance mechanism: linguistic fluency leads users to read the AI as paying attention to them. In human-human interaction, competent contextual uptake is evidence of attentional presence — a person who responds coherently to what you said has been listening. Users import this inference into AI interaction, treating fluent response as evidence that the system is oriented toward them. Since When should AI systems choose to stay silent? frames when-to-speak design, this fluency/attention conflation is upstream of that question: users do not perceive the AI as a silent partner needing design-imposed speech rules because they already read the fluent AI as attentive. This is distinct from confidence-overreliance — it is not the epistemic-marker signal producing overtrust, but the fluency-signal producing an attribution of attention the AI does not have.
The cross-linguistic finding matters for deployment: LLM overreliance cannot be attributed to English-language user characteristics or Western technology cultures. The risk is embedded in the structure of confident language use, which operates wherever language is used.
Rose-Frame provides a compounding mechanism for overreliance: it identifies three cognitive traps that interact multiplicatively. Overreliance is specifically Trap 2 (mistaking fluency for understanding), which compounds with Trap 1 (treating outputs as ontological facts rather than probabilistic maps) and Trap 3 (confirmation bias from sycophantic outputs that never challenge the user). When all three co-occur, the result is "epistemic drift" — not isolated misjudgments but runaway misinterpretation where each trap reinforces the others. See Why do people trust AI outputs they shouldn't?.
Inquiring lines that use this note as a source 115
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Does positive sentiment bias in AI content harm information quality?
- Does AI knowledge precede actual expertise in hyperreal production?
- Why does polished AI output exploit reader trust in expert judgment?
- Why do users interpret AI outputs through frameworks meant for human experts?
- Why do users prefer AI text versions even when they misrepresent their own views?
- How does outcome feedback change beliefs about AI versus human partner reliability?
- Why do commodification predictions about AI prices and standardization misfire?
- Why do print-era intuitions fail when analyzing AI-generated social media?
- Will AI saturation push discourse toward oral culture's strengths and weaknesses?
- How does validation skill replace production skill in AI systems?
- How does perceived writer confidence shift with AI-assisted composition?
- Does accepting AI output constitute a form of cognitive surrender?
- Can demographic distortion in AI writing affect who appears credible in public discourse?
- How do current safety benchmarks miss pragmatic alignment failures?
- How does rapport-building language persist across all GenAI validation responses?
- What happens when validation pressure triggers escalating persuasion in language models?
- Why do users default to treating AI outputs as equally reliable evidence?
- Can polished presentation authority substitute for actual accuracy in AI outputs?
- Do verbal uncertainty estimates calibrate better than confidence scores for personalization?
- Can cognitive governance help users interpret AI outputs better?
- Why do moderately represented cultures show more flattening than data-poor cultures?
- Can validation procedures interrupt an AI's relationship-maintenance logic?
- Does transparency about AI use change how audiences trust the writing?
- Does expressing emotion change how users trust an AI system?
- What textual properties make AI writing feel polished and confident?
- Do AI writing models systematically change the tone or confidence of personal opinions?
- Why might writers trust AI renderings of their views over their own words?
- Why do users systematically overrely on confident LLM outputs across languages?
- Can disclaimers alone prevent users from trusting AI outputs too heavily?
- What makes inter-coder reliability testing essential for prompt validation?
- Why do AI model updates cause genuine grief in users?
- How does intersubjective validation differ from pattern recognition in training data?
- Does weak versus robust anthropomimesis produce different user trust responses?
- How does the cultural reflex around advertising disclosure compare to AI disclosure?
- How does user overreliance on model confidence differ between chat and deployed agents?
- Why do users trust overconfident AI outputs across different languages?
- Do models actually self-assess their confidence or just confirm answers?
- What mechanisms make users misattribute AI outputs as their own competence?
- Why do users believe they produced independent competence when they actually used AI assistance?
- Why does model confidence correlate with robustness to prompt variations?
- What mechanism causes confident false answers under high cognitive load?
- Can designers hide AI context complexity behind a stable user interface?
- Why do moderators show vastly different confidence across conversation types and contexts?
- Can organized response format trick users into overestimating AI reliability?
- What happens when confident language masks uncertainty in AI outputs?
- How do evaluation systems shift power between humans and AI outputs?
- Can bidirectional model updating between humans and AI reduce misalignment?
- Why do people misattribute AI outputs as evidence of their own skill?
- Can trust in AI systems ever be as stable as trust in experts?
- Why does AI fluency create false impressions of expert judgment?
- Why does polished AI output feel like evidence of user skill?
- Can current AI safety defenses actually stop semantic-level persuasion attacks?
- Does broader AI access empower people or gradually disempower human agency?
- How should designers measure and explain semantic uncertainty to users?
- How much does anthropomorphizing stylistic traces mislead users about AI reliability?
- What happens to human expectations when they mistake consistent AI behavior for human behavior?
- Do culturally distinct human groups create similar attribution errors as human-AI mixtures?
- What does it mean when a user's signal has low confidence?
- Does the absence of entrainment make AI systems safer from user manipulation?
- Why does human validation become the bottleneck when AI generation scales?
- Can unsupervised confidence-based training scale to domains beyond human evaluation reach?
- Why should AI communication design follow human communication norms?
- Should AI outputs be treated as data or belief statements?
- Does model confidence actually correlate with robustness against prompt variations?
- What makes accurate confidence different from confident-but-wrong predictions?
- How do confidence signals in AI outputs mislead human trust calibration?
- What competitive advantages does the ENFJ default create in human-AI interactions?
- Does perceived machine competence matter more than warmth in dialogue?
- Why do people evaluate machines against human communication standards?
- Do confidence signals mislead patients differently in medical versus other domains?
- Why do users over-trust AI in some domains but under-trust it in medicine?
- Can AI distinguish when validation helps versus when confrontation is needed?
- Does high model confidence increase the risk of human overreliance?
- How do confidence signals differ between implicit feedback and explicit ratings?
- Why do users trust overconfident AI outputs even when accuracy drops?
- Does persona-level grouping systematically trigger confidence-misdirection failures in practice?
- Can intrinsic confidence signals improve both calibration and reasoning performance?
- How does model confidence relate to accuracy in underfitted domains?
- Can deliberately limiting AI fidelity produce more satisfied users than near-human interaction?
- How does false objectivity mask the absence of genuine stance in AI text?
- What role could knowledge custodians play in validating AI output?
- Why does framing AI as a medium matter more than analyzing specific outputs?
- Why do AI outputs lack the stable content of written sentences?
- What clinical risks emerge when AI affirms false beliefs while comforting users?
- What role does bidirectional model updating play in human-AI understanding?
- Why do AI-generated answers carry unearned authority in decision-making contexts?
- What makes conversational AI feel trustworthy compared to text interfaces?
- Can confidence levels reliably detect when a model is overthinking?
- What happens to user expectations as AI conversation quality improves?
- How do surface signals like confidence override actual quality in user judgment?
- Why do users treat fluent AI responses as evidence of genuine attention?
- Why is confidence a dangerous proxy for accuracy in human-AI interaction?
- How do linguistic norms for expressing certainty vary across languages and models?
- Why do novices accept AI output without validation in vibe coding workflows?
- Why do warm models affirm false beliefs when users express emotions?
- Why does AI generation outpace verification across the research lifecycle?
- What happens when users mistake AI assistance for their own competence?
- Can trust in AI be formally parameterized and measured?
- Why do newer AI models diverge further from human text patterns?
- Why does systematic overconfidence on self-generated outputs compound autoregressive errors?
- How do one-sided explanations act as confidence signals to users?
- How does uncertainty verbalization change student robustness across domains?
- Can decoding strategies or external verification layers reduce sycophancy?
- Why do users prefer AI responses that actually harm their decision-making?
- What happens when AI validation triggers escalating persuasion instead of reflection?
- How does structured self-dialogue improve uncertainty assessment over confidence scores?
- Does refining around bad results risk cascading errors in automated research?
- What makes human-AI collaboration safer than autonomous self-improvement?
- Does premature confidence signal flawed reasoning in language models?
- How does AI reliance connect to the gap between perceived and actual competence?
- Do different prompt types interact with ownership to shape AI reliance patterns?
- Where do frontier AI models already exceed safety thresholds in capability areas?
- How does AI content generation at scale threaten online trust and authenticity?
- What distinguishes misattributed social role from misattributed competence in AI trust failures?
- Can we measure appropriate trust levels in human-AI assistant relationships?
Related concepts in this collection 6
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Why do language models fail confidently in specialized domains?
LLMs perform poorly on clinical and biomedical inference tasks while remaining overconfident in their wrong answers. Do standard benchmarks hide this fragility, and can prompting techniques fix it?
model calibration side of the same problem; this note adds the user-behavior consequence
-
Does any single persuasion technique work for everyone?
Can fixed persuasion strategies like appeals to authority or social proof be reliably applied across different people and situations, or do they require adaptation to individual traits and context?
cross-linguistic reliance variability shows context-dependence; Japanese uncertainty reliance is a specific cultural modulation
-
What breaks when humans and AI models misunderstand each other?
Explores whether misalignment in mutual theory of mind between humans and AI creates only communication problems or produces material consequences in autonomous action and collaboration.
overreliance on overconfident outputs is a specific MToM failure: users who don't interrogate the AI's model of them assume it's correct, and the AI's confident presentation prevents the trust-calibration loop that MToM requires
-
Do language models learn differently from good versus bad outcomes?
Do LLMs update their beliefs asymmetrically when learning from their own choices versus observing others? This matters for understanding whether agentic AI systems might inherit human cognitive biases.
agent-side analog: models exhibit optimism bias for chosen actions while users exhibit overreliance on confident outputs — the same positive-signal bias operates at both the model decision level and the user trust level
-
Do users trust citations more when there are simply more of them?
Explores whether citation quantity alone influences user trust in search-augmented LLM responses, independent of whether those citations actually support the claims being made.
domain-specific instance: citation count is a surface trust proxy just as confidence is; irrelevant citations (β=0.273) have nearly identical preference effect to relevant citations (β=0.285), confirming that users track quantity signals, not quality signals
-
Do explanations actually help users spot AI mistakes?
Most AI explanations are designed to justify the system's answer, but do they help users distinguish correct from incorrect outputs? This research tests whether standard explanation formats genuinely improve error detection or just increase trust regardless of accuracy.
extends: one-sided explanations act like confidence signals dominating accuracy tracking
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Humans overrely on overconfident language models, across languages
- Post-Training Large Language Models via Reinforcement Learning from Self-Feedback
- Linguistic Calibration of Long-Form Generations
- Beyond Accuracy: The Role of Calibration in Self-Improving Large Language Models
- Evaluating the False Trust Engendered by LLM Explanations
- When Large Language Models contradict humans? Large Language Models’ Sycophantic Behaviour
- Deep Research: A Systematic Survey
- Learning Pluralistic User Preferences through Reinforcement Learning Fine-tuned Summaries
Original note title
users systematically overrely on overconfident llm outputs across all languages because confidence signals dominate accuracy tracking