INQUIRING LINE

How do confidence signals differ between implicit feedback and explicit ratings?

This explores what 'confidence' means when it comes from behavior you can observe (clicks, watches, purchases) versus a number someone deliberately gives you (a star rating) — and why the two aren't interchangeable.


This explores the gap between confidence you infer from behavior and confidence someone states outright. The cleanest answer in the corpus comes from classic recommender work: implicit feedback actually carries *two* magnitudes that an explicit rating squashes into one. When you watch something, buy something, or click something, that signal splits into **preference** (which direction you lean) and **confidence** (how sure we can be about that lean) — a single click is weak evidence, a hundred repeat-plays is strong evidence for the same preference Can implicit feedback reveal both preference and confidence?. A five-star rating, by contrast, collapses both into a number and throws away the certainty dimension: it tells you the preference but not how much to trust it.

That 'one signal is really two' pattern shows up again in a very different setting — agent feedback. Natural feedback decomposes into an *evaluative* part (how well something went) and a *directive* part (how it should change), and a flat scalar reward captures only the first while discarding the second Can scalar rewards capture all the information in agent feedback?. Same lesson under different vocabulary: the moment you compress a rich behavioral or natural signal down to a single rating, you lose a hidden second channel. Explicit ratings are exactly that kind of lossy compression.

There's a catch on the implicit side, though, that explicit ratings mostly dodge: behavioral signals are contaminated by *selection bias*. You only observe clicks on things the system already chose to show, in the positions it showed them. YouTube's ranking team found you have to model that bias explicitly — with a separate position tower — or the system mistakes 'shown at the top' for 'preferred' and amplifies its own past decisions into a feedback loop Why do ranking systems need to model selection bias explicitly?. So implicit confidence is richer but dirtier; explicit confidence is thinner but cleaner, because the user chose to give it deliberately rather than having it inferred from a constrained menu.

What the corpus also surfaces — the thing you might not have known to ask — is that confidence as a *signal a human reads off a machine* has its own pathologies, and people track it badly. Users across every language tested follow an AI's expressed confidence rather than its actual accuracy, so a confidently-stated wrong answer gets followed systematically Do users worldwide trust confident AI outputs even when wrong?. And those confident errors are precisely the ones that hide from aggregate metrics, concentrating in the rare high-harm cases Why do confident wrong answers hide in standard accuracy metrics?. The through-line: an *explicit* confidence statement is persuasive but easy to fake or miscalibrate, while *implicit* confidence (how consistently a behavior repeats, or how stable a model's outputs are) is harder to game because it's emitted rather than declared — which is also why model-internal confidence is increasingly mined as a reward signal in its own right Can model confidence work as a reward signal for reasoning?.


Sources 6 notes

Can implicit feedback reveal both preference and confidence?

Hu, Koren, and Volinsky show that implicit signals (watches, purchases, clicks) encode preference and confidence as two distinct dimensions. Explicit ratings collapse these into one number, losing information about certainty in the preference estimate.

Can scalar rewards capture all the information in agent feedback?

Natural feedback carries two orthogonal types of information: evaluative (how well an action performed) and directive (how it should change). Scalar rewards capture evaluation but discard directional specifics that token-level distillation can recover, making the two complementary rather than redundant.

Why do ranking systems need to model selection bias explicitly?

YouTube's multi-objective ranker uses MMoE for conflicting objectives and a shallow position tower to remove selection bias from training data. Without both mechanisms, models converge on degenerate equilibria that amplify their own past decisions.

Do users worldwide trust confident AI outputs even when wrong?

Cross-linguistic research shows users in every language trust confident AI outputs even when inaccurate. While confidence expression varies by language, users everywhere track confidence signals rather than accuracy, making overconfident errors systematically followed.

Why do confident wrong answers hide in standard accuracy metrics?

Medical triage, legal interpretation, and financial planning show a consistent pattern: surface heuristics conflict with unstated constraints, producing fluent confident errors that concentrate in rare cases where harm occurs. Aggregate accuracy masks these failures because overall performance looks strong.

Can model confidence work as a reward signal for reasoning?

RLSF uses answer-span confidence to rank reasoning traces, creating synthetic preferences that strengthen step-by-step reasoning while reversing RLHF's calibration degradation—without requiring human labels or external verifiers.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a recommender systems and LLM confidence researcher. The question: how do confidence signals differ between implicit feedback (inferred from behavior) and explicit ratings (stated outright)—and which carries more actionable or trustworthy information?

What a curated library found — and when (dated claims, not current truth):
Findings span 2017–2026; treat these as perishable constraints:
  • Implicit feedback decomposes into two magnitudes—preference direction AND confidence strength (e.g., one click vs. 100 replays)—while explicit ratings collapse both into a single lossy number, discarding certainty (2017).
  • Implicit confidence is richer but contaminated by selection bias (shown-at-top mistaken for preferred); explicit confidence is cleaner but thinner because user-chosen rather than inferred (2017–2025).
  • LLM users systematically overrely on expressed confidence rather than actual accuracy across all languages; confident errors hide from aggregate metrics and concentrate in rare high-harm cases (2025-07).
  • Model-internal confidence (stability, consistency) is increasingly mined as intrinsic reward signal, simultaneously restoring calibration and improving reasoning (2025–2026).

Anchor papers (verify; mind their dates):
  • arXiv:1708.05031 (2017) — Neural Collaborative Filtering
  • arXiv:2507.06306 (2025-07) — Humans overrely on overconfident language models, across languages
  • arXiv:2508.06225 (2025-08) — Overconfidence in LLM-as-a-Judge: Diagnosis and Confidence-Driven Solution
  • arXiv:2603.12372 (2026-03) — Efficient Reasoning with Balanced Thinking

Your task:
  (1) RE-TEST THE DECOMPOSITION CLAIM. Does the 'two-magnitude' framing still hold for modern implicit signals (e.g., token-level attention, latency, cache-hit patterns) vs. newer explicit ratings (e.g., pairwise comparisons, bandit feedback)? Separately: has selection-bias correction matured enough to make implicit and explicit equally clean, or is the gap still real?
  (2) Surface the strongest CONTRADICTING or SUPERSEDING work on confidence calibration from the last 6 months—especially any showing users *do* track true accuracy when confidence is properly framed, or new methods that make implicit signals equally persuasive as explicit ones.
  (3) Propose 2 research questions that assume the regime may have moved: (a) Can multi-signal fusion (implicit + explicit + model-internal) be optimized to outperform any single channel, and if so, does one channel still dominate in high-stakes decisions? (b) Do post-training methods like self-feedback learning (2025-07) change what 'confidence' means operationally—e.g., by making it a learned artifact rather than a raw signal?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines