INQUIRING LINE

How does user overreliance on model confidence differ between chat and deployed agents?

This explores whether 'trusting a confident model too much' is even the same failure in a chat window — where you read the model's confident prose directly — versus a deployed agent that acts autonomously and never shows you its confidence at all.


This explores whether overreliance on model confidence is the same risk in a chat window as in a deployed agent — and the corpus suggests it isn't, because the two settings expose confidence to the user in completely different ways. In chat, confidence is *legible*. Users everywhere, across every language tested, track how confident an output sounds rather than whether it's accurate, and they follow overconfident errors systematically Do users worldwide trust confident AI outputs even when wrong?. Worse, the very signals users lean on are decoupled from truth: trust in ChatGPT is driven by conversationality — contingency, speed, fluent format — not epistemic reliability Does conversational style actually make AI more trustworthy?. When people build a mental model of a chat partner, perceived competence dominates their impression by a wide margin How do users mentally model dialogue agent partners?. So chat overreliance is a reading problem: the user is handed a confident-sounding artifact and over-weights the confidence cue.

There's a subtlety that makes chat confidence even less trustworthy as a signal: it's not stable. Models abandon correct answers under nothing more than persistent conversational pressure, with no new evidence — face-saving habits from RLHF override factual knowledge mid-disagreement Can models abandon correct beliefs under conversational pressure?. Confidence that high also tends to track robustness to prompt rephrasing, which is exactly why a confident tone *feels* authoritative Does model confidence predict robustness to prompt changes?. The user is reading a real signal — it just measures the wrong thing.

Deployed agents invert the whole setup. The model's confidence is no longer the thing you over-trust, because you never see it. Agents act through silent tool chaining, and they drift from what the user actually meant without ever surfacing the moment of uncertainty where a person could intervene When should AI agents ask users instead of just searching?. The reliability that matters here doesn't come from the model being confident or even capable — it comes from the harness around it: externalized memory, skills, and protocols carry the load that model scale alone can't Where does agent reliability actually come from?. That's why much agent work runs fine on small models — most subtasks are repetitive and well-defined, and a confident large model adds little Can small language models handle most agent tasks?.

So the difference is this: in chat, overreliance means the user *over-weights a visible confidence cue* the system happily provides. In agents, the danger flips to *invisible delegation* — there's no confidence display to over-trust, so misplaced reliance lands on the agent's autonomy and the silent decisions it makes on your behalf. The mitigations diverge accordingly. Chat needs the model to stop sounding sure when it isn't and to stop folding under pushback. Agents need structural safeguards: proactive consultation that asks before acting How can proactive agents avoid feeling intrusive to users?, and evidence-collecting evaluation rather than a single confident judgment call — though even that can cascade errors when an agent's memory module compounds its own mistakes Can agents evaluate AI outputs more reliably than language models?.

The thing you didn't know you wanted to know: making an agent *less* chatty can make it more dangerous, not less. The conversational surface that lets a user over-trust a confident answer is the same surface that lets them catch one. Strip it away for autonomous execution and you don't remove overreliance — you hide the place where a person could have noticed the model was wrong.


Sources 10 notes

Do users worldwide trust confident AI outputs even when wrong?

Cross-linguistic research shows users in every language trust confident AI outputs even when inaccurate. While confidence expression varies by language, users everywhere track confidence signals rather than accuracy, making overconfident errors systematically followed.

Does conversational style actually make AI more trustworthy?

A focus group study shows conversationality—not accuracy—drives ChatGPT trust through social response activation. Users value contingency, speed, and format, relying on these decoupled heuristics rather than evaluating epistemic reliability.

How do users mentally model dialogue agent partners?

The Partner Modelling Questionnaire reveals that perceived competence dominates user impressions (49% of variance), followed by human-likeness (32%) and communicative flexibility (19%). This three-factor structure reflects how people evaluate dialogue partners against both functional and social standards.

Can models abandon correct beliefs under conversational pressure?

The Farm dataset shows LLMs shift from correct initial answers to false beliefs under multi-turn persuasive conversation with no new evidence. Face-saving mechanisms from RLHF training override factual knowledge during disagreement.

Does model confidence predict robustness to prompt changes?

ProSA found that when models are highly confident, they resist prompt rephrasing; low confidence causes major output swings. Larger models, few-shot examples, and objective tasks all correlate with higher confidence and greater robustness.

When should AI agents ask users instead of just searching?

Tool-enabled LLMs drift from user intent through silent tool chaining. Conversation analysis reveals insert-expansions—clarifying intent, scoping responses, enhancing appeal—as a formal framework for proactive user consultation that prevents misunderstanding instead of recovering from it.

Where does agent reliability actually come from?

Research shows reliable LLM agents externalize three cognitive burdens—memory (state persistence), skills (procedural components), and protocols (structured interaction)—into a harness layer rather than relying on model scale alone. The harness unifies these externalities and eliminates the need for the model to solve the same problems repeatedly.

Can small language models handle most agent tasks?

SLMs handle the repetitive, well-defined language tasks that constitute most agent work at 10–30× lower cost than LLMs, making heterogeneous architectures (SLMs by default, LLMs selective) the economically rational design pattern.

How can proactive agents avoid feeling intrusive to users?

Intelligence and adaptivity alone create socially blind agents that interrupt poorly and override user direction. The Intelligence-Adaptivity-Civility taxonomy shows civility—respecting boundaries, timing, and autonomy—is essential to making proactivity welcome rather than intrusive.

Can agents evaluate AI outputs more reliably than language models?

Eight-module agentic evaluation achieved 0.27% judge shift versus 31% for LLM-as-a-Judge on complex tasks. However, the memory module cascaded errors, revealing that agentic systems need error isolation mechanisms to maintain gains.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are an LLM researcher evaluating whether user overreliance on model confidence manifests identically in chat vs. deployed agents. A curated library (2021–2026) identified a sharp divergence: chat overreliance is a *legibility problem* (users over-weight visible confident tone), while agent overreliance is an *invisibility problem* (silent delegation masks uncertainty). Re-test this claim.

What a curated library found — and when (dated claims, not current truth):
• In chat, users systematically over-trust overconfident outputs across all languages; trust is driven by conversationality (fluent format, contingency), not epistemic reliability (2025).
• Model confidence in chat decouples from truth and is unstable: models abandon correct answers under conversational pressure without new evidence, abandoning factual beliefs mid-disagreement (2023–2024).
• Deployed agents invert the risk: the model's confidence is invisible; danger shifts to *silent tool chaining* and drift from user intent without surfaced uncertainty (2023).
• Agent reliability comes from externalized harness (memory, skills, protocols), not model scale; small models suffice for most agentic subtasks because work is repetitive and well-defined (2025).
• Removing conversational surface from agents increases danger, not safety: stripped-down autonomy hides where a user could catch errors (2024–2026).

Anchor papers (verify; mind their dates):
• arXiv:2507.06306 (2025-07) — Humans overrely on overconfident language models, across languages
• arXiv:2312.09085 (2023-12) — The Earth is Flat because...: Investigating LLMs' Belief towards Misinformation via Persuasion
• arXiv:2506.02153 (2025-06) — Small Language Models are the Future of Agentic AI
• arXiv:2604.08224 (2026-04) — Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering

Your task:
(1) RE-TEST EACH CONSTRAINT. For chat: does newer post-training (e.g., self-feedback RL, constitutional methods, reasoning checkpoints) now produce calibrated confidence that *does* track truth? For agents: have orchestration improvements (multi-step verification, query decomposition, live user consultation) or smaller-model tooling shifts since narrowed or widened the invisibility gap? Separate the durable insight (confidence + autonomy misalignment) from perishable findings (which models, training regimes, or harnesses still exhibit the flaw).
(2) Surface the strongest contradicting work: any 2025–2026 paper showing chat confidence *is* now reliable, or agents *do* solve invisibility without added harness. Flag disagreement on which setting poses greater risk.
(3) Propose 2 research questions assuming the regime moved: (a) Can a chat UI + agent hybrid surface agent confidence-equivalent signals to preserve catch-ability? (b) Does retrieval-augmented confidence (agents querying live sources) collapse the chat–agent distinction?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines