Large Causal Models From Large Language Models

Paper · arXiv 2512.07796 · Published December 8, 2025

We introduce a new paradigm for building large causal models (LCMs) that exploits the enormous potential latent in today’s large language models (LLMs). We describe our ongoing experiments with an implemented system called DEMOCRITUS (Decentralized Extraction of Manifold Ontologies of Causal Relations Integrating Topos Universal Slices) aimed at building, organizing, and visualizing LCMs that span disparate domains extracted from carefully targeted textual queries to LLMs. DEMOCRITUS is methodologically distinct from traditional narrow domain and hypothesis centered causal inference that builds causal models from experiments that produce numerical data. A high-quality LLM (e.g. the 80-billion parameter Qwen3-Next-80B-A3B-Instruct 2.) is used to propose topics, generate causal questions, and extract plausible causal statements from a diverse range of domains. The technical challenge is then to take these isolated, fragmented, potentially ambiguous and possibly conflicting causal claims, and weave them into a coherent whole, converting them into relational causal triples and embedding them into a LCM.

Introduction. We introduce a new paradigm for building large causal models (LCMs) that exploits the enormous potential latent in today’s large language models (LLMs) (Bommasani et al., 2022; DeepSeek-AI et al., 2025). Much of the decadeslong effort in causal discovery (Imbens and Rubin, 2015; Pearl, 2009; Zanga and Stella, 2023) has focused on constructing causal knowledge from carefully controlled highly specialized topically narrow studies in particular domains that typically yields numerical data. DEMOCRITUS is a methodologically distinct enterprise: build LCMs spanning potentially hundreds of distinct domains and ranging over millions of very specific causal claims, by carefully combining the vast knowledge latent in LLMs, with state-of-the-art categorical causal (Mahadevan, 2025b; Fritz, 2020) and deep learning methods (Fong et al., 2019; Mahadevan, 2024; Gavranovi ́c et al., 2024). Our goal in this paper is to showcase the potential of DEMOCRITUS.

Discussion / Conclusion. Democritus, as presented here, is deliberately modular: LLM choice, triple extraction, GT architecture, and manifold visualization can all be upgraded independently. The key lessons from our initial experiments are: In this paper, we have treated causal structure primarily as a directed, mostly acyclic graph over variables and mechanisms extracted from language. This DAG-like perspective is already useful for exploration and hypothesis generation, but many of the domains we care about are fundamentally dynamical. For example, the Indus Valley case study in our archaeology slice involves climate model simulations, hydrological models of Indus River discharge, and multi-decadal drought episodes, which are naturally described by systems of differential equations, state-space models, or agent-based simulators. We leave this direction for future work. Our focus here is on showing that LLMs and Geometric Transformers can already produce rich, structured causal maps across economics, biology, and archaeology.

Large Causal Models From Large Language Models

Synthesis notes that discuss concepts related to this paper