INQUIRING LINE

Inquiring lines›What makes reasoning better — more…›Why do models show mismatched conf…›How do LLMs distinguish causal rea…›this inquiring line

AI that reasons in cause-and-effect chains still can't handle gut feeling or analogy — can those gaps be filled?

Can causal models be extended to include non-causal cognition?

This explores whether causal-reasoning models — the kind built to map cause-and-effect — can be stretched to cover the parts of human thinking that aren't causal at all: gut association, analogy, and emotion.

This explores whether causal-reasoning models can be stretched to cover the non-causal parts of cognition. The corpus has a direct answer to this, and it's refreshingly honest: causal models capture only a slice of how humans actually reason. The GenMinds work on causal belief networks is explicit that while these networks are excellent at representing cause-and-effect, they simply cannot represent associative links, analogical mappings, or emotion-driven shifts in belief — and crucially, its authors frame this not as a finished theory but as a tractable starting point Can causal models alone capture how humans actually reason?. So the question's premise is already conceded inside the research: causality alone leaves gaps, and the interesting work is in what you bolt on.

The reason this matters becomes clearer when you look at where causal reasoning is strong and where it isn't. In LLMs, causal reasoning consistently outperforms temporal reasoning — not because cause-and-effect is cognitively deeper, but because causal connectives ('because', 'therefore') appear explicitly and frequently in training text, while temporal order usually has to be inferred Why do LLMs handle causal reasoning better than temporal reasoning?. That hints at something important for your question: the 'causal' competence we observe may itself be a surface artifact of language statistics rather than a clean reasoning module. And models reproduce human causal *errors* exactly — weak explaining-away, Markov violations in collider networks — suggesting the same statistical substrate drives both the causal and the non-causal, messy parts of cognition Do large language models make the same causal reasoning mistakes as humans?. If causal and associative reasoning share one substrate, extending one to include the other may be less about adding a new module and more about acknowledging they were never fully separate.

There's a methodological cousin to this in interpretability research that reframes the whole question. To actually understand what a network is doing, you need *both* representational analysis (what's encoded, the correlational/associative picture) and causal analysis (what intervening actually changes). Neither alone is sufficient — representations show correlations without causation, causal tests show effects without explaining them Can we understand LLM mechanisms with only representational analysis?. That's essentially the same shape as your question one level up: a complete account of cognition needs the causal scaffolding *and* the associative/representational layer working together.

Where the corpus goes quiet is on a concrete unified architecture — nobody here has built the causal-plus-associative-plus-emotional model the GenMinds authors gesture toward. But the surrounding notes suggest what a non-causal extension would have to absorb: reasoning that draws on broad transferable procedural patterns rather than fact lookup Does procedural knowledge drive reasoning more than factual retrieval?, and a structural split where factual knowledge lives in lower network layers while reasoning adjustment happens higher up Why does reasoning training help math but hurt medical tasks?. The takeaway you didn't expect: the obstacle to extending causal models isn't that non-causal cognition is exotic — it's that causal reasoning itself may already *be* a special, well-labeled case of the same associative machinery, which means 'extension' might really mean dropping the assumption that causality was ever a clean, separable thing.

Sources 6 notes

Can causal models alone capture how humans actually reason?

Causal belief networks excel at modeling causal reasoning but cannot represent associative links, analogical mappings, or emotion-driven belief shifts. The GenMinds framework itself acknowledges this as a tractable starting point rather than a complete theory.

Why do LLMs handle causal reasoning better than temporal reasoning?

ChatGPT excels at causal relations but struggles with temporal ordering because causal connectives are explicit and frequent in training data, while temporal order is often implicit and must be inferred contextually.

Do large language models make the same causal reasoning mistakes as humans?

LLMs show weak explaining away and Markov violations in collider networks, matching human error patterns exactly. This suggests shared mechanisms rooted in training data statistics rather than categorical reasoning inferiority.

Can we understand LLM mechanisms with only representational analysis?

Representational analysis alone identifies correlations without causation; causal analysis alone shows behavioral effects without explaining them. Only paired methods—locating candidate features representationally, then verifying causally—produce complete mechanistic claims.

Does procedural knowledge drive reasoning more than factual retrieval?

Analysis of 5 million pretraining documents shows reasoning relies on broad, transferable procedural knowledge from diverse sources, unlike factual recall which depends on narrow, document-specific memorization of target facts.

Show all 6 sources

Why does reasoning training help math but hurt medical tasks?

Two-phase inference model shows knowledge retrieval operates in lower network layers while reasoning adjustment happens in higher layers. This separation explains why reasoning training improves math but can degrade knowledge-intensive domains like medicine.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Do Large Language Models Reason Causally Like Us? Even Better?2.54 match · arxiv ↗
Making Reasoning Matter: Measuring and Improving Faithfulness of Chain-of-Thought Reasoning2.44 match · arxiv ↗
Premise Order Matters in Reasoning with Large Language Models1.69 match · arxiv ↗
LLMs can implicitly learn from mistakes in-context1.68 match · arxiv ↗
Eliciting Reasoning in Language Models with Cognitive Tools1.66 match · arxiv ↗
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens1.65 match · arxiv ↗
Mitigating Hallucinations in Large Language Models via Causal Reasoning1.61 match · arxiv ↗
LLM Reasoning Is Latent, Not the Chain of Thought1.58 match · arxiv ↗

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a cognitive science researcher probing whether causal-reasoning frameworks can genuinely extend to non-causal cognition, or whether that boundary is fundamental. The question remains open.

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–2026; treat each as time-stamped, possibly superseded:
• Causal reasoning in LLMs consistently outperforms temporal reasoning, likely because causal connectives appear explicitly in training text rather than due to deeper cognitive structure (2025).
• LLMs exhibit human-like causal *errors* (weak explaining-away, Markov violations), suggesting causal and associative reasoning share a single statistical substrate, not separate modules (2025).
• Mechanistic understanding requires *both* representational analysis (correlational/associative picture) *and* causal intervention testing; neither alone suffices (2025).
• Procedural knowledge in pretraining drives reasoning generalization; factual knowledge resides in lower layers, reasoning adjustment in higher layers (2024).
• No unified architecture yet integrates causal, associative, and emotion-driven belief shifts—GenMinds framed this as tractable, not solved (2025).

Anchor papers (verify; mind their dates):
• arXiv:2502.10215 (2025-02) – Do Large Language Models Reason Causally Like Us?
• arXiv:2507.18178 (2025-07) – Decoupling Knowledge and Reasoning: Cognitive Dual-System Theory
• arXiv:2508.04495 (2025-08) – Causal Reflection with Language Models
• arXiv:2604.08756 (2026-04) – Artifacts as Memory Beyond the Agent Boundary

Your task:
(1) RE-TEST EACH CONSTRAINT. For causal-reasoning superiority: has test-time scaling, multi-step orchestration (memory, caching, structured prompting), or new evaluation harnesses since mid-2025 *dissolved* the surface advantage? For the shared-substrate claim: do any newer mechanistic studies isolate a *truly* separable causal module, or do they strengthen the unified-substrate view? Does the dual-system framing (2507.18178) hold up under 2026 models? Plainly state which constraints still appear real and which are perishable.
(2) Surface the strongest work from the last ~6 months that *contradicts* or *supersedes* the unified-substrate hypothesis—i.e., evidence for genuine causal isolation or a discrete non-causal mode.
(3) Propose 2 research questions that assume the boundary may have shifted: (a) if causal and associative are one substrate, what architectural change (not just prompt engineering) would cleanly *separate* them? (b) what benchmark would definitively show whether emotion-driven or analogical shifts can be *predicted* from causal-reasoning machinery alone?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

AI that reasons in cause-and-effect chains still can't handle gut feeling or analogy — can those gaps be filled?

Related lines of inquiry

Sources 6 notes

Papers this line draws on 8