INQUIRING LINE

How do neural networks decompose complex tasks into modular subnetworks?

This explores whether networks carve a hard task into separable pieces on their own — and whether that modularity is something they grow naturally, something you have to force, or something that's quietly faked.


This explores whether networks carve a hard task into separable pieces on their own — and whether that modularity is real or an illusion. The corpus splits into three camps, and reading them together is more interesting than any one alone. The first camp says modularity happens for free: prune a trained network and you find that compositional subroutines live in isolated subnetworks, so ablating one only breaks the function it implements, and pretraining makes this clean separation more reliable across architectures Do neural networks naturally learn modular compositional structure?. In the same spirit, you don't even need a special architecture for the *behavior* — plain MLPs generalize compositionally once the training data covers enough combinations, and you can read the individual ingredients linearly off the hidden activations Can neural networks learn compositional skills without symbolic mechanisms?.

The second camp doesn't wait for modularity to emerge — it builds it in. You can train transformers with sparse weights so neurons map to simple concepts with clear wiring, producing circuits that ablation confirms are both necessary and sufficient for the task Can sparse weight training make neural networks interpretable by design?. At the architecture level, splitting a model into a separate 'decomposer' (plans the steps) and 'solver' (executes them) beats a single monolithic model, and the planning skill even transfers across domains while the solving skill doesn't Does separating planning from execution improve reasoning accuracy?. The same instinct shows up in something as concrete as function calling, where breaking the job into seven explicit subtasks — nested calls, chaining, parallel calls, parameter detection — generalizes better than one undifferentiated dataset Can breaking function calling into subtasks improve model generalization?. And circuit tracing inside Claude shows the network organizing itself into a four-tier hierarchy — tokens, then abstract concepts, then operations, then outputs — which is decomposition by depth rather than by isolated module How do language models organize features across processing layers?.

The third camp is the skeptics, and they're the reason this question is worth asking. Transformers often *look* like they decompose tasks but are really memorizing computation subgraphs from training and gluing them together, which is why they collapse on genuinely novel combinations Do transformers actually learn systematic compositional reasoning?. The deeper diagnosis is the binding problem: networks struggle to segregate entities, keep them representationally separate, and reuse that structure in new arrangements — the prerequisites for true modular composition Why do neural networks fail at compositional generalization?. Most unsettling, two networks can produce identical outputs on every input while one has clean structure and the other has 'fractured, entangled' internals that can't transfer or recombine Can identical outputs hide broken internal representations? — meaning a model can ace every benchmark and still understand nothing about how the pieces fit Can AI pass every test while understanding nothing?.

The thing you didn't know you wanted to know: whether a network is genuinely modular is invisible from the outside. The same accuracy can sit on top of clean isolated subnetworks or on top of a memorized tangle, and the only way to tell them apart is to go inside — prune it, ablate it, trace its circuits. That's why the 'does it decompose?' question keeps circling back to interpretability tools rather than test scores. Worth noting too that the *sparsity* these tools rely on is itself learned, not fixed — networks default to sparse representations for unfamiliar inputs and dense ones for familiar data Is representational sparsity learned or intrinsic to neural networks?, so how cleanly a model decomposes a task may depend on how much it has seen of that task.


Sources 11 notes

Do neural networks naturally learn modular compositional structure?

Pruning experiments reveal that neural networks implement compositional subroutines in isolated subnetworks, with ablations affecting only their corresponding function. Pretraining substantially increases the consistency and reliability of this modular structure across architectures and domains.

Can neural networks learn compositional skills without symbolic mechanisms?

Standard MLPs achieve compositional generalization through data and model scaling alone, without architectural modifications, provided the training distribution sufficiently covers combinations of task modules. Linear decodability of constituents from hidden activations reliably predicts success.

Can sparse weight training make neural networks interpretable by design?

Training transformers with sparse weights creates compact, human-interpretable circuits where neurons correspond to simple concepts with clear connections. Ablation studies confirm these circuits are necessary and sufficient for task performance, though scaling beyond tens of millions of parameters while maintaining interpretability remains unsolved.

Does separating planning from execution improve reasoning accuracy?

Modular architectures with separate decomposer and solver models outperform monolithic LLMs, with decomposition ability transferring across domains while solving ability does not. The separation prevents planning-execution interference and produces more generalizable skills.

Can breaking function calling into subtasks improve model generalization?

Granite-20B-FunctionCalling shows that explicit training across seven granular subtasks—nested calls, chaining, parallel functions, name detection, parameter detection, next-best function, and response generation—generalizes better than umbrella datasets like ToolLLM. This multi-task approach closes the performance gap with GPT, Claude, and Gemini.

How do language models organize features across processing layers?

Circuit tracing in Claude models reveals features progress from token-level inputs to abstract concepts to functional operations to outputs. Larger models develop richer abstract features, suggesting scaling enables higher-level conceptual reasoning rather than pattern memorization.

Do transformers actually learn systematic compositional reasoning?

Research shows transformers succeed on in-distribution tasks by memorizing computation subgraphs from training data, not by learning systematic rules. They fail drastically on novel compositions, with errors compounding across reasoning steps.

Why do neural networks fail at compositional generalization?

Greff et al. argue that neural networks cannot dynamically bind distributed information into compositional structures due to three failures: segregating entities from inputs, maintaining representational separation, and reusing learned structure in novel combinations. Scaling can partially overcome this by enabling compositional representations to emerge.

Can identical outputs hide broken internal representations?

Networks trained with SGD reproduce outputs perfectly while having radically different internal structure than evolved networks, with weight perturbations revealing fractured, entangled representations that prevent transfer to novel contexts or creative recombination.

Can AI pass every test while understanding nothing?

The Fractured Entangled Representation hypothesis shows that SGD-trained networks can produce identical outputs across all inputs while maintaining radically different internal representations. Standard benchmarks cannot detect this structural difference.

Is representational sparsity learned or intrinsic to neural networks?

During pretraining, neural networks develop dense activations for familiar training data and default to sparse representations for unfamiliar inputs. This trend emerges without task-specific fine-tuning and reflects how models consolidate knowledge through exposure.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing claims about neural network modularity against the latest evidence. The question remains open: Do neural networks genuinely decompose complex tasks into separable, reusable subnetworks, or do they memorize and glue computation fragments together?

What a curated library found — and when (dated claims, not current truth):
Findings span 2020–2026; treat each as a time-stamped boundary, not ground truth.

• Networks naturally carve tasks into isolated, ablatable subnetwork modules during pretraining; pruning reveals compositional subroutines (2023–2024).
• Sparse weight structures produce interpretable disentangled circuits where neurons map to single concepts with verified necessity and sufficiency (2025).
• Explicit architectural decomposition — separating 'decomposer' (planning) from 'solver' (execution) — transfers planning skill across domains better than monolithic models (2024).
• Networks *appear* modular but often memorize subgraphs from training and fail on novel combinations; identical outputs can mask either clean structure or fractured entanglement (2025).
• Sparsity is learned and data-dependent: networks use dense representations for familiar tasks, sparse ones for unfamiliar inputs (2026).

Anchor papers (verify; mind their dates):
• arXiv:2020.05208 (2020) — the binding problem as a core constraint on compositionality.
• arXiv:2505.11581 (2025) — fractured entanglement: identical performance masks internal disorder.
• arXiv:2511.13653 (2025) — weight sparsity yields interpretable circuits.
• arXiv:2603.03415 (2026) — OOD sparsity patterns.

Your task:
(1) RE-TEST the modular-emergence vs. memorization divide. For each claim above, probe whether newer training methods (curriculum learning, synthetic compositional data at scale), architectural innovations (Titans-style test-time memory, sparse-by-default designs), or circuit-tracing tooling have since RELAXED the binding problem or sharpened separation between genuine modularity and illusory structure. Pinpoint what *still* appears unresolved.
(2) Surface work from the last 6 months that CONTRADICTS the optimism of the first camp (modularity for free) or the skeptics' pessimism. Look for reconciliations or surprises.
(3) Propose 2 research questions that assume the regime may have shifted: one on *verifying* modularity without ablation, one on *inducing* modularity synthetically.

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines