How do neural networks decompose tasks into modular subnetworks that transfer?
This explores whether neural networks split tasks into reusable modular pieces on their own — and what determines whether those pieces actually transfer to new tasks rather than just memorizing the old ones.
This question sits on a fault line in the corpus: some work finds that networks spontaneously carve tasks into clean, reusable modules, and other work finds that what looks modular is often memorization in disguise. Both turn out to be true under different conditions, and the gap between them is exactly where 'transfer' lives.
On the optimistic side, pruning experiments show that networks naturally implement compositional subroutines inside isolated subnetworks — you can ablate one and only its corresponding function breaks, leaving the rest intact Do neural networks naturally learn modular compositional structure?. Crucially, pretraining sharpens this: the modular structure becomes more consistent and reliable across architectures. That reusable scaffolding is what makes transfer possible. The clearest evidence is in length generalization, where models trained jointly on related tasks reuse the *same attention heads*, so a shorter task can borrow machinery to extrapolate beyond its own training length — and pretrained models already ship with that scaffolding built in Can length generalization transfer between different related tasks?. You can even force this modularity rather than wait for it: training with sparse weights produces compact circuits where neurons map to simple concepts, and ablation confirms each circuit is necessary and sufficient for its task Can sparse weight training make neural networks interpretable by design?.
But here's the thing the question doesn't anticipate: *which* part of a decomposed task transfers is not symmetric. When researchers split a reasoning system into a 'decomposer' (breaks the problem into steps) and a 'solver' (executes them), the decomposition skill transfers across domains while the solving skill does not Does separating planning from execution improve reasoning accuracy?. Planning is the portable module; execution stays parochial. The same flavor of insight drives function-calling work, where carving one task into seven explicit subtasks and training on them jointly generalizes better than a single umbrella dataset Can breaking function calling into subtasks improve model generalization?. Modularity isn't just emergent — you can engineer it by naming the seams.
Now the pessimistic side, which is what makes 'that transfer' the load-bearing phrase in your question. Transformers often *appear* compositional while actually memorizing computation subgraphs from training data — they succeed in-distribution and then fail drastically on novel combinations, with errors compounding step by step Do transformers actually learn systematic compositional reasoning?. The deeper diagnosis is the binding problem: networks struggle to segregate entities, keep their representations separate, and recombine them in new ways — the three things genuine modular transfer requires Why do neural networks fail at compositional generalization?. And the most unsettling result: two networks can produce *identical outputs on every input* while one has clean structure and the other has 'fractured, entangled' internals — and it's precisely the fractured one that can't transfer to novel contexts or recombine creatively Can identical outputs hide broken internal representations?. Benchmarks can't see this difference at all Can AI pass every test while understanding nothing?.
The synthesis worth taking away: decomposition into transferable modules is real, but it's a property of *internal structure*, not output behavior — and the two can diverge completely. Scaling helps by making compositional representations emerge when training covers enough of the combination space Can neural networks learn compositional skills without symbolic mechanisms?, but coverage that produces correct answers does not guarantee the clean, separable circuitry that lets a module move to a task it never saw. The frontier question isn't 'can networks be modular' — it's 'how do we tell true modularity from a convincing forgery,' since the thing that transfers is invisible to the tests we usually trust.
Sources 10 notes
Pruning experiments reveal that neural networks implement compositional subroutines in isolated subnetworks, with ablations affecting only their corresponding function. Pretraining substantially increases the consistency and reliability of this modular structure across architectures and domains.
Models trained jointly on related tasks reuse the same attention heads to handle length generalization, allowing shorter tasks to extrapolate beyond their training length. Pretrained models already contain this reusable computational scaffolding.
Training transformers with sparse weights creates compact, human-interpretable circuits where neurons correspond to simple concepts with clear connections. Ablation studies confirm these circuits are necessary and sufficient for task performance, though scaling beyond tens of millions of parameters while maintaining interpretability remains unsolved.
Modular architectures with separate decomposer and solver models outperform monolithic LLMs, with decomposition ability transferring across domains while solving ability does not. The separation prevents planning-execution interference and produces more generalizable skills.
Granite-20B-FunctionCalling shows that explicit training across seven granular subtasks—nested calls, chaining, parallel functions, name detection, parameter detection, next-best function, and response generation—generalizes better than umbrella datasets like ToolLLM. This multi-task approach closes the performance gap with GPT, Claude, and Gemini.
Research shows transformers succeed on in-distribution tasks by memorizing computation subgraphs from training data, not by learning systematic rules. They fail drastically on novel compositions, with errors compounding across reasoning steps.
Greff et al. argue that neural networks cannot dynamically bind distributed information into compositional structures due to three failures: segregating entities from inputs, maintaining representational separation, and reusing learned structure in novel combinations. Scaling can partially overcome this by enabling compositional representations to emerge.
Networks trained with SGD reproduce outputs perfectly while having radically different internal structure than evolved networks, with weight perturbations revealing fractured, entangled representations that prevent transfer to novel contexts or creative recombination.
The Fractured Entangled Representation hypothesis shows that SGD-trained networks can produce identical outputs across all inputs while maintaining radically different internal representations. Standard benchmarks cannot detect this structural difference.
Standard MLPs achieve compositional generalization through data and model scaling alone, without architectural modifications, provided the training distribution sufficiently covers combinations of task modules. Linear decodability of constituents from hidden activations reliably predicts success.