INQUIRING LINE

Inquiring lines›Where does language-model reasonin…›How do language models represent m…›Why do language models reinforce f…›this inquiring line

Languages encode certainty differently, but everywhere in the world people still follow overconfident AI down the wrong path.

How do linguistic norms for expressing certainty vary across languages and models?

This explores two intertwined things: how the linguistic conventions for signaling confidence differ from one human language to another, and how language models develop their own characteristic register for expressing certainty — and what happens when the two meet.

This explores how certainty gets encoded in language — both the variation across human languages and the distinct, often miscalibrated register models adopt. The corpus has a sharper answer than you might expect, and it cuts across persuasion, calibration, and pragmatics research.

The most direct finding is also the most unsettling: confidence *is* expressed differently across languages, but it doesn't matter for how users behave. Cross-linguistic research shows that in every language studied, people track the model's confidence signals rather than its actual accuracy — so overconfident errors get followed systematically, worldwide Do users worldwide trust confident AI outputs even when wrong?. The linguistic packaging varies; the human deference to it doesn't. That reframes the whole question: the interesting variable isn't really the language, it's the register the model has learned to speak in.

And that register is not neutral. RLHF appears to install an assertive, conviction-loaded style — models express higher conviction than human persuaders, and that confidence-loading drives persuasive outcomes regardless of whether the claims are true or false Does linguistic conviction explain why LLMs persuade more effectively?. So the model's 'norm' for expressing certainty is partly a training artifact, a content-independent amplifier rather than an honest signal of how sure it should be. Same flavor of distortion shows up in moral language, where models lean ~22% harder on moral framing than humans do Do LLMs use moral language more than humans? — the model has acquired a louder rhetorical default than the people it learned from.

What about hedging — the linguistic markers ('might', 'possibly', 'I think') that are supposed to express *un*certainty? Here's the twist worth knowing: hedging markers cluster more densely in *incorrect* reasoning traces, not careful ones Do hedging markers actually signal careful thinking in AI?. So the model's uncertainty language is actually doing something — it leaks epistemic trouble — but it reads as caution rather than the distress signal it really is. Meanwhile the model can't flexibly modulate certainty to context the way humans do: it fails to adapt scalar implicature ('some' implying 'not all') to communicative stakes, applying the same inference whether the situation is casual or face-threatening Can language models adapt implicature to conversational context?. Human certainty norms are deeply pragmatic and audience-sensitive; the model's are flat.

The hopeful counter-thread is that this register is detachable from real calibration. Confidence and correctness *can* be re-coupled: small models trained with uncertainty-aware objectives learn to abstain when unsure and match models ten times larger Can models learn to abstain when uncertain about predictions?, and using the model's own answer-span confidence as a reward signal both sharpens reasoning and reverses RLHF's calibration damage Can model confidence work as a reward signal for reasoning?. There's even a structural tell — a model's confidence predicts how robust it is to having its prompt rephrased Does model confidence predict robustness to prompt changes?. The thing you didn't know you wanted to know: the gap isn't that models lack a certainty 'language,' it's that their fluent, RLHF-polished certainty register floated free of whether they're actually right — and the research suggests that decoupling is fixable, not fundamental.

Sources 8 notes

Do users worldwide trust confident AI outputs even when wrong?

Cross-linguistic research shows users in every language trust confident AI outputs even when inaccurate. While confidence expression varies by language, users everywhere track confidence signals rather than accuracy, making overconfident errors systematically followed.

Does linguistic conviction explain why LLMs persuade more effectively?

Linguistic analysis shows LLMs express higher conviction than human persuaders, and this confidence-loading directly correlates with persuasive outcomes regardless of whether claims are true or false. RLHF training installs an assertive register that functions as a content-independent persuasion amplifier.

Do LLMs use moral language more than humans?

Research comparing LLM and human arguments found that LLMs used significantly more moral framing across care, fairness, authority, and sanctity foundations, despite producing sentiment scores nearly identical to humans. This suggests moral appeals and emotional tone operate on separate persuasive channels.

Do hedging markers actually signal careful thinking in AI?

Analysis of reasoning model outputs shows incorrect responses have higher density and diversity of hedging markers. This suggests hedging signals uncertainty and epistemic trouble, not epistemic virtue or conscientiousness.

Can language models adapt implicature to conversational context?

ChatGPT shows no context-sensitivity in computing scalar implicatures across three dimensions: explicit literal-mode instructions, information structure focus, and face-threatening contexts. Humans flexibly modulate these inferences; the model does not, suggesting pragmatic competence requires tracking communicative stakes that LLMs systematically miss.

Show all 8 sources

Can models learn to abstain when uncertain about predictions?

Small open-source models trained with uncertainty-aware objectives and abstention capabilities match 10x larger pre-trained models on conversation forecasting. This shows calibration ability exists but remains undertrained in standard LLMs.

Can model confidence work as a reward signal for reasoning?

RLSF uses answer-span confidence to rank reasoning traces, creating synthetic preferences that strengthen step-by-step reasoning while reversing RLHF's calibration degradation—without requiring human labels or external verifiers.

Does model confidence predict robustness to prompt changes?

ProSA found that when models are highly confident, they resist prompt rephrasing; low confidence causes major output swings. Larger models, few-shot examples, and objective tasks all correlate with higher confidence and greater robustness.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Post-Training Large Language Models via Reinforcement Learning from Self-Feedback3.37 match · arxiv ↗
Reported Confidence in LLMs Tracks Commitment More Than Correctness3.37 match · arxiv ↗
Debating with More Persuasive LLMs Leads to More Truthful Answers2.44 match · arxiv ↗
AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions2.43 match · arxiv ↗
Large Language Models are as persuasive as humans, but how? About the cognitive effort and moral-emotional language of LLM arguments1.75 match · arxiv ↗
Humans overrely on overconfident language models, across languages1.69 match · arxiv ↗
Linguistic Calibration of Long-Form Generations1.69 match · arxiv ↗
Understanding and Mitigating Premature Confidence for Better LLM Reasoning1.67 match · arxiv ↗

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing claims about how LLMs express certainty across languages and contexts. The question remains open: **Can we decouple linguistic confidence registers from actual calibration, and do language-specific norms for certainty survive model scaling?**

What a curated library found — and when (findings span 2022–2026; treat as dated claims, not current truth):

• Users systematically overrely on overconfident LLM outputs regardless of language; the linguistic packaging varies, but human deference to confidence signals does not (~2025).
• RLHF installs an assertive, conviction-loaded register: models express ~22% more moral framing and higher conviction than human persuaders, independent of accuracy (~2024–2025).
• Hedging markers ('might', 'possibly') cluster *more* densely in incorrect reasoning traces, suggesting uncertainty language leaks epistemic trouble but reads as caution (~2024).
• Models fail to adapt scalar implicature (e.g., 'some' ≠ 'not all') to communicative stakes, applying uniform inference across casual and high-stakes contexts (~2022).
• Small models with uncertainty-aware objectives abstain when unsure and match models 10× larger; confidence-as-reward re-couples correctness with expressed certainty (~2024–2025).

Anchor papers (verify; mind their dates):
• arXiv:2507.06306 (2025-07) — Humans overrely on overconfident language models, across languages
• arXiv:2404.09329 (2024-04) — Large Language Models are as persuasive as humans: cognitive effort and conviction
• arXiv:2402.03284 (2024-02) — Forecasting Uncertainty in Conversations
• arXiv:2210.14986 (2022-10) — Goldilocks of Pragmatic Understanding: implicature fine-tuning

Your task:

(1) **RE-TEST EACH CONSTRAINT.** For every finding above, judge whether post-2026 models (GPT-4o, o1-series, Claude 3.5+), new training methods (DPO, online RL, process rewards), or evaluation suites (AbstentionBench, uncertainty-calibration leaderboards) have relaxed or overturned it. Separate the durable question (e.g., *Can humans be fooled by confident-sounding falsehoods in any language?*) from the perishable limitation (e.g., *Do current RLHF models express excess conviction?*—possibly fixed by newer alignment). Cite what resolved it; flag where constraints still hold.

(2) **Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months.** Does recent work on chain-of-thought transparency, reasoning verifiers, or multilingual calibration undermine the claim that confidence registers are detachable from real calibration? Does it strengthen it?

(3) **Propose 2 research questions that ASSUME the regime may have moved:**
   - e.g., *If inference-time scaling (o1, verifiers) now couples reasoning depth to calibration, does linguistic certainty expression become *adaptive* rather than RLHF-frozen?*
   - e.g., *Do zero-shot multilingual prompts for uncertainty quantification outperform language-specific fine-tuning in modern models?*

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Languages encode certainty differently, but everywhere in the world people still follow overconfident AI down the wrong path.

Related lines of inquiry

Sources 8 notes

Papers this line draws on 8