Can neural networks actually achieve compositional generalization?
For decades, theorists argued connectionist models fundamentally lack the structure needed for compositionality. But modern LLMs exhibit sophisticated compositional behaviors despite sharing the same design principles. What changed?
Compositionality — composing arbitrary concepts into novel combinations for open-ended expressive capacity — has long been held up as a property of human intelligence that neural networks cannot explain, an argument (Fodor & Pylyshyn) that led many to dismiss them as models of cognition. This review traces the debate from Frege through ChatGPT and presses the tension: modern DNNs, sharing the same fundamental design principles as their dismissed predecessors, now dominate AI and exhibit behaviors thought to require compositional processing — syntactically complex error-free sentences, cogent chains of reasoning, original programs. The classical confidence that connectionism lacks the constituent structure for compositionality sits uneasily with this empirical record.
The keeper is the philosophical reframing: the question is no longer "can neural nets be compositional?" but why models without explicit symbolic constituent structure nonetheless produce compositional behavior — and what that implies for the symbolist/connectionist divide and for theories of human cognition.
This is a theory-anchor for Adrian's philosophy-of-mind thread. It complements the empirical Can neural networks learn compositional skills without symbolic mechanisms? (the mechanism: scaling + training coverage) and the limit case in Why do neural networks fail at compositional generalization? (where compositionality still fails), framing the debate the vault's What happens to social order when AI removes ritual constraints? map tracks.
Inquiring lines that use this note as a source 5
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Why does recursion on latent state drive generalization better than hierarchy?
- What makes recurrent depth enable compositional generalization across tasks?
- How does scaling and training data enable compositional behavior without symbolic mechanisms?
- Where do neural networks still fail at compositional generalization despite scaling?
- How should we rethink the symbolism versus connectionism debate in light of LLMs?
Related concepts in this collection 3
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Can neural networks learn compositional skills without symbolic mechanisms?
Do neural networks need explicit symbolic architecture to compose learned concepts, or can scaling alone enable compositional generalization? This asks whether compositionality is an architectural feature or an emergent property of scale.
the empirical mechanism behind the behavior this review philosophically frames
-
Why do neural networks fail at compositional generalization?
Exploring whether the binding problem from neuroscience explains neural networks' inability to systematically generalize. The binding problem has three aspects—segregation, representation, and composition—each creating distinct failure modes in how networks handle structured information.
the limit case: where compositional generalization still breaks
-
Are language models developing real functional competence or just formal competence?
Neuroscience suggests formal linguistic competence (rules and patterns) and functional competence (real-world understanding) rely on different brain mechanisms. Can next-token prediction alone produce both, or does it leave functional competence behind?
adjacent competence-debate framing for what next-token prediction does and doesn't yield
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Break It Down: Evidence for Structural Compositionality in Neural Networks
- From Frege to chatGPT: Compositionality in language, cognition, and deep neural networks
- Scaling can lead to compositional generalization
- Faith and Fate: Limits of Transformers on Compositionality
- Compositional Reasoning with Transformers, RNNs, and Chain of Thought
- Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models
- Using Computational Models to Test Syntactic Learnability
- Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
Original note title
modern neural networks challenge the classical argument that connectionist models cannot achieve compositional generalization