SYNTHESIS NOTE

Do neural networks naturally learn modular compositional structure?

Explores whether neural networks decompose compositional tasks into distinct subroutines without explicit symbolic design. This challenges the longstanding view that neural networks are fundamentally non-compositional.

Synthesis note · 2026-02-23 · sourced from MechInterp

Structural compositionality is the extent to which neural networks break down compositional tasks into subroutines and implement them in modular subnetworks. The alternative: matching inputs to learned templates without task decomposition.

The evidence supports compositionality. Using model pruning to isolate subnetworks:

Subnetworks that implement one subroutine can be identified
Ablating a subnetwork harms its corresponding subroutine while leaving others largely intact
This holds across multiple architectures (CNNs, transformers), tasks (vision, language), and scales

The pretraining effect: models initialized with pretrained weights more reliably produce modular subnetworks than randomly initialized models. Self-supervised pretraining appears to create internal structure that is more amenable to compositional decomposition. This suggests that the representations learned during pretraining have a modular quality that fine-tuning can exploit.

This provides empirical support against the longstanding objection that neural networks are fundamentally non-compositional. The finding: "some simple pseudo-symbolic computations might be learned directly from data using standard gradient-based optimization techniques." Explicit symbolic mechanisms may be unnecessary — gradient-based optimization discovers compositional structure when the task demands it and pretraining provides a good initialization.

The result is not perfect: "most do not exhibit perfect task decomposition." Compositionality is partial and graded, not all-or-nothing. Some architecture-task combinations show stronger structural compositionality than others.

This connects to the weight-sparsity finding: Can sparse weight training make neural networks interpretable by design? shows that enforcing sparsity produces clean decomposition. The structural compositionality paper shows that decomposition also emerges naturally, albeit imperfectly, from standard training. Sparsity amplifies a tendency that already exists.

Inquiring lines that read this note 127

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

What determines success in training models on multiple tasks?

Is embodied interaction necessary for language meaning and genuine agency?

Why does frame-activation matter more than word-by-word composition?

What are the consequences of models training on synthetic data?

Can world models form from aggregated partial information across training distributions?

Which computational strategies best support reasoning in language models?

Does recurrence enable reasoning capabilities that fixed-depth transformers cannot achieve?

What limits mechanistic interpretability's ability to characterize models?

How does reasoning graph topology affect breakthrough insights and generalization?

Do autonomous architecture discoveries follow predictable scaling laws?

Does model scaling alone produce compositional generalization without symbolic mechanisms?

What articulatory information do speech signals carry that text cannot?

Do language model representations contain causally steerable task-specific features?

How do training priors constrain what context information can override?

Can neural networks learn that A implies B in reverse?

Do language models develop causal world models or rely on statistical patterns?

How can identical external performance mask different internal representations?

How does sequence length affect sparsity tolerance in models?

What memory architectures best support persistent reasoning across extended interactions?

Do language models perform faithful symbolic reasoning independent of semantic grounding?

When does architectural design matter more than raw model capacity?

How do neural networks separate factual knowledge from reasoning abilities?

Does fine-tuning modify underlying model capabilities or only behavioral outputs?

How does policy entropy collapse constrain reasoning-focused reinforcement learning?

How does representational convergence differ from policy entropy collapse in iterative training?

How do transformer attention mechanisms implement memory and algorithmic functions?

How do attention patterns and circuits function as algorithmic representations?

Why do semantic similarity and task relevance diverge in vector embeddings?

Why do reasoning models fail at systematic problem-solving and search?

Why do long-context language models struggle with compositional reasoning tasks?

Why do continual learning scenarios trigger catastrophic forgetting and interference?

Does decoupling planning from execution improve multi-step reasoning accuracy?

Does AI fluency substitute for verifiable accuracy in human judgment?

What does a human-parseable framework for deep learning look like?

How does latent reasoning compare to verbalized chain-of-thought?

Why does recursion on latent state drive generalization better than hierarchy?

Can next-token prediction alone produce genuine language understanding?

What does next-token prediction tell us about compositional linguistic competence?

Why do benchmark improvements fail to reflect actual reasoning quality?

How does requential coding measure true simplicity without parameter count inflation?

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

14 direct connections · 110 in 2-hop network ·medium cluster Open in graph ↗

Do neural networks naturally learn modular compo… Can sparse weight training make neural networks in… Do base models already contain hidden reasoning ab… Can neural networks learn compositional skills wit…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Can sparse weight training make neural networks interpretable by design? Explores whether constraining most model weights to zero during training produces human-understandable circuits and disentangled representations, rather than attempting to reverse-engineer dense models after training.
sparsity amplifies the compositional decomposition that standard training already partially produces
Do base models already contain hidden reasoning ability? Explores whether reasoning capability emerges during pre-training as a latent feature rather than being created by post-training methods like reinforcement learning or fine-tuning.
pretraining-induced modularity is part of the "latent capability" that minimal signals can activate
Can neural networks learn compositional skills without symbolic mechanisms? Do neural networks need explicit symbolic architecture to compose learned concepts, or can scaling alone enable compositional generalization? This asks whether compositionality is an architectural feature or an emergent property of scale.
complementary evidence: scaling enables compositionality in behavior; pruning reveals it in structure

Do neural networks naturally learn modular compositional structure?

Inquiring lines that read this note 127

Related concepts in this collection 3

Related papers in this collection 8

Search by related questions 4