From Frege to chatGPT: Compositionality in language, cognition, and deep neural networks
Compositionality has long been considered a key explanatory property underlying human intelligence: arbitrary concepts can be composed into novel complex combinations, permitting the acquisition of an open ended, potentially infinite expressive capacity from finite learning experiences. Influential arguments have held that neural networks fail to explain this aspect of behavior, leading many to dismiss them as viable models of human cognition. Over the last decade, however, modern deep neural networks (DNNs)—which share the same fundamental design principles as their predecessors—have come to dominate artificial intelligence, exhibiting the most advanced cognitive behaviors ever demonstrated in machines. In particular, large language models (LLMs), DNNs trained to predict the next word on a large corpus of text, have proven capable of sophisticated behaviors such as writing syntactically complex sentences without grammatical errors, producing cogent chains of reasoning, and even writing original computer programs—all behaviors thought to require compositional processing.
Introduction. Deep neural networks (DNNs) have made breakthrough after breakthrough in artificial intelligence (AI) over the last decade, reproducing sophisticated cognitive behaviors from advanced gameplay in board games like Go (Silver et al., 2016), Chess (Silver et al., 2018), and Diplomacy (FAIR et al., 2022), to generating stunning images from arbitrary natural language instructions (Betker et al., 2023), to catalyzing achievements in mathematics (Romera-Paredes et al., 2024), science (Jumper et al., 2021), engineering (Merchant et al., 2023), and medicine (Omiye et al., 2023; Singhal et al., 2023). The large language models (LLMs) powering AI products like chatGPT (Brown et al., 2020; Bubeck et al., 2023; OpenAI et al., 2024) have shown remarkable proficiency in cognitive domains previously regarded as uniquely human, such as natural language syntax (Linzen and Baroni, 2021), critical reasoning and argumentation (Herbold et al., 2023), and computer programming (Chen et al., 2021; Bubeck et al., 2023).
Discussion / Conclusion. Classical cognitive theorists like Chomsky (1965) and Fodor (1975) developed compelling arguments that compositionality is a central property of human cognition, crucial for explaining the creativity, productivity, and systematicity of language and thought (see section 2.1). Neural network modeling was dismissed as an empirically inadequate explanatory paradigm because these models seem to lack the kind of combinatorial constituent structure intrinsic to classical architectures based on atomic symbols and governed by syntactic rules (see section 2.3; Fodor and Pylyshyn, 1988). Even as deep neural networks blew past competing frameworks in virtually every area of AI, initial investigations into their compositionality seemed to confirm traditional intuitions that they were not capable of replicating the kinds of compositional generalization that humans exhibit (see section 4.1; Lake and Baroni, 2018; Kim and Linzen, 2020). One way of understanding these initial findings is from the statistical learning perspective: compositional generalization requires out-of-distribution (o.o.d.)