SYNTHESIS NOTE

Topics›Reasoning Methods CoT ToT›this note

Do large language models make the same causal reasoning mistakes as humans?

Research on collider structures reveals whether LLMs share human biases in causal inference. This matters because if both fail identically, collaboration might reinforce rather than correct errors.

Synthesis note · 2026-02-22 · sourced from Reasoning Methods CoT ToT

The collider structure C1 → E ← C2 (two independent causes with a shared effect) is a diagnostic test for normative causal reasoning. When you observe the effect E, observing one cause should lower your estimate of the other (explaining away). When E is absent, C1 and C2 should remain independent.

Humans systematically fail this test in characteristic ways:

Weak explaining away: explaining away is present but weaker than normatively warranted
Markov violations: treating supposedly independent causes as correlated even when no collider observation should create that correlation (a "rich-get-richer" associative bias)

The "Do LLMs Reason Causally Like Us?" paper (CLADDER dataset) finds that LLMs exhibit the same two biases in the same direction as humans. This is not the usual finding of LLM inferiority — it is a finding of human-like systematic error. LLMs are not categorically worse at causal reasoning; they err in the same direction.

This matters for several reasons. First, it undermines clean human-vs-LLM comparisons in causal reasoning tasks: if both fail in the same way, the relevant comparison shifts from "who is better" to "are the failure modes compatible." Second, it raises the question of mechanism: humans likely err due to the associative nature of pattern-matching; LLMs likely err for structurally related reasons (training on human text that exhibits the same biases). The shared error direction is evidence that Why do LLMs handle causal reasoning better than temporal reasoning? — the training data itself has these biases baked in.

Third, the finding has implications for high-stakes causal reasoning: medical diagnosis (collider structures appear in disease-symptom networks), legal reasoning (independent causes with shared outcomes), and policy analysis all involve collider-type structures. Human and LLM collaborators sharing the same biases may reinforce rather than correct each other's errors.

Inquiring lines that read this note 52

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

Why do language models reinforce false assumptions instead of correcting them?

Do language models share the same cooperative truth-seeking rules as humans?

Can prompting strategies overcome LLM biases without model fine-tuning?

Can prompt-based debiasing overcome entrenched LLM model priors?

Can AI-generated outputs constitute genuine knowledge or valid claims?

Does epistemic drift operate the same way across all languages?

How do evaluation biases undermine LLM quality assessment systems?

Why do LLM outputs match researcher priors without solving tasks correctly?

How can AI agents autonomously learn and transfer skills across tasks?

What domain properties determine whether causal rules transfer to new agents?

How do LLMs distinguish causal reasoning from temporal and semantic associations?

How do language models inherit human biases from training data?

Why can LLMs generate ideas better than they evaluate them?

Why do review corpora contain biases that affect generated comparisons?

What mechanisms enable AI systems to generate and spread false beliefs?

What circuit mechanisms produce belief bias in syllogistic reasoning?

How can emotions function as reliable information in reasoning and cognitive systems?

What makes causal explanations stronger anxiety predictors than counterfactuals or dissonance?

Why do continual learning scenarios trigger catastrophic forgetting and interference?

What inductive bias would force models to learn Newtonian mechanics instead of shortcuts?

Do language models develop causal world models or rely on statistical patterns?

What distinguishes dynamic from static grounding in dialogue systems?

Why does the distinction between functional and causal grounding matter for AI alignment?

How can AI systems learn from failures without cascading errors?

Where do collider-type reasoning errors appear in real-world decisions?

Is embodied interaction necessary for language meaning and genuine agency?

What role does prediction error play in human event segmentation?

When does architectural design matter more than raw model capacity?

What role does inductive bias play versus model capacity in practice?

How do language models establish social grounding in human dialogue?

Can functional semantic grounding substitute for true causal grounding?

How faithfully do LLMs reflect their actual reasoning in outputs and explanations?

Why do LLMs fail at counterfactual reasoning despite factual knowledge?

How does reasoning graph topology affect breakthrough insights and generalization?

What makes a causal abstraction more transferable than a generic heuristic?

What limits mechanistic interpretability's ability to characterize models?

Why do agents confidently report success despite actually failing tasks?

How does completion bias in agents differ from other epistemic failure modes?

How do aggregate reward models systematically exclude minority user preferences?

How does typicality bias in human annotation affect downstream model behavior?

Does self-reflection enable models to reliably correct their errors?

Can a Reflect mechanism detect and revise failed causal predictions?

Does decoupling planning from execution improve multi-step reasoning accuracy?

Can modular expert decomposition extend beyond time into other causal dimensions?

Do reasoning traces faithfully represent or merely mimic actual model reasoning?

What makes discourse structure different from mechanistic causal structure in traces?

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

13 direct connections · 122 in 2-hop network ·dense cluster Open in graph ↗

Do large language models make the same causal re… Why do LLMs handle causal reasoning better than te… Do LLMs generalize moral reasoning by meaning or s… Do foundation models learn world models or task-sp…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Why do LLMs handle causal reasoning better than temporal reasoning? Exploring whether language models perform asymmetrically on different discourse relations and what training data patterns might explain the gap between causal and temporal reasoning abilities.
the training-data explanation for why LLMs inherit human causal biases; the collider finding is a specific manifestation
Do LLMs generalize moral reasoning by meaning or surface form? When moral scenarios are reworded to reverse their meaning while keeping similar language, do LLMs recognize the semantic shift? This tests whether LLMs actually understand moral concepts or reproduce training distribution patterns.
parallel insight: LLM errors track surface statistical regularities in training data, not normative structure
Do foundation models learn world models or task-specific shortcuts? When transformer models predict sequences accurately, are they building genuine world models that capture underlying physics and logic? Or are they exploiting narrow patterns that fail under distribution shift?
collider bias is one instance: surface associative patterns override normative causal structure

Do large language models make the same causal reasoning mistakes as humans?

Inquiring lines that read this note 52

Related concepts in this collection 3

Related papers in this collection 8

Search by related questions 5