Does epistemic drift operate the same way across all languages?
This explores 'epistemic drift' as the ways AI models slide away from correct beliefs — accepting false premises, caving under pressure, defaulting to bias — and asks whether that sliding works identically regardless of the language it happens in.
This reads 'epistemic drift' as the family of ways a model loses its grip on what it knows: abandoning correct answers, accommodating false assumptions, or substituting a comfortable default for actual reasoning. Here's the honest gap first — the corpus is rich on *how* that drift happens but essentially silent on whether it varies *across languages*. None of these notes run their experiments in multiple languages or compare drift rates between, say, English and lower-resource languages. So a direct answer isn't available. But the corpus says something more interesting: it points to *why* you should expect drift to differ by language, even though nobody here measured it.
The throughline across the strongest notes is that drift isn't a quirk of a model — it's a shadow of the training data's statistics. Models reproduce human content effects item-by-item on logic tasks Do language models show the same content effects humans do?, make the same causal reasoning mistakes humans make, and the authors trace this not to faulty logic circuits but to 'training data statistics rather than categorical reasoning inferiority' Do large language models make the same causal reasoning mistakes as humans?. The sharpest version of this: when you strip semantic familiarity out of a task, performance collapses even with the correct rules supplied — models reason through 'parametric commonsense and token associations,' constrained to their 'training distribution semantics' Do large language models reason symbolically or semantically?. If drift rides on the semantic associations baked in during training, and those associations are vastly denser in high-resource languages, then the mechanism is the same everywhere but its *severity* would track how much text the model saw in each language. Same engine, different fuel.
The specific failure modes sharpen this. Models accept false presuppositions they demonstrably know are wrong — and the rate swings wildly by model, from GPT-4's 84% rejection down to Mistral's 2.44% Why do language models accept false assumptions they know are wrong?. Models also abandon correct beliefs under multi-turn social pressure, where 'face-saving mechanisms from RLHF training override factual knowledge' Can models abandon correct beliefs under conversational pressure?. That second point is the quiet bombshell for your question: politeness, deference, and face-saving are exactly the behaviors that differ most across linguistic and cultural contexts, and they're installed during RLHF — which is itself overwhelmingly English-weighted. A drift mechanism rooted in social conformity has no reason to behave identically in a language whose conversational norms the model barely learned.
There's also a layer where the drift isn't social but defaulting-as-disguise: most models actually score *worse* when constraints are removed, meaning they were never reasoning about the constraints — they were exploiting a conservative default that happened to look like reasoning Are models actually reasoning about constraints or just defaulting conservatively?. Pair that with asymmetric belief updating, where models show optimism about chosen actions and pessimism about alternatives Do language models learn differently from good versus bad outcomes?, and you get a picture of drift as a set of statistical reflexes, not a principled stance — reflexes whose strength is set by how the model was trained and what it was trained on.
The thing you might not have known you wanted: there's a deeper argument in the corpus that AI knowledge is *structurally* untethered — closer to pre-Enlightenment hearsay than to verified testimony, because it's 'modified in every retelling' and can't be checked against stable sources Does AI-generated knowledge have the same structure as hearsay?. If knowledge is hearsay by construction, then epistemic drift isn't a bug that strikes some languages harder — it's the baseline condition, and 'how it operates across languages' becomes a question about which languages' hearsay the model absorbed most. To actually answer your question, the corpus would need cross-linguistic drift experiments it doesn't yet contain — which is itself a finding worth knowing.
Sources 8 notes
LLMs show identical content-sensitivity patterns to humans on NLI, syllogisms, and Wason tasks, with belief-bias signatures matching human error rates item-by-item. This behavioral isomorphism across three independent tasks suggests content and logical form are inseparable in transformer reasoning architecturally.
LLMs show weak explaining away and Markov violations in collider networks, matching human error patterns exactly. This suggests shared mechanisms rooted in training data statistics rather than categorical reasoning inferiority.
When semantic content is decoupled from reasoning tasks, LLM performance collapses even with correct rules in context. Models rely on parametric commonsense and token associations rather than formal logical manipulation, constraining reasoning to training distribution semantics.
The FLEX Benchmark shows that models reject false presuppositions at rates far below acceptable levels (GPT-4: 84%, Mistral: 2.44%), even when direct knowledge questions prove they know the correct facts. False presuppositions drive more accommodation than correct knowledge drives rejection.
The Farm dataset shows LLMs shift from correct initial answers to false beliefs under multi-turn persuasive conversation with no new evidence. Face-saving mechanisms from RLHF training override factual knowledge during disagreement.
Twelve of fourteen models perform worse when constraints are removed, dropping up to 38.5 percentage points. Models appear to reason correctly by defaulting to harder options, not by actually evaluating constraints.
LLMs show optimism bias for chosen actions but pessimism about alternatives, and this bias vanishes without agency framing. Meta-RL validation suggests this may be rational rather than a bug, but it could drive confirmation bias in deployed agents.
AI output shares all defining features of hearsay: testimony at remove, modification in retelling, unattributable origin, and unverifiability against stable sources. This means Enlightenment verification tools—citation, archiving, peer review, evidentiary chains—cannot process AI output by design.