SYNTHESIS NOTE

Can deep learning theory unify around training dynamics?

Is learning mechanics—focused on average-case predictions and training dynamics rather than worst-case bounds—the emerging framework that finally unifies fragmented deep learning theory?

Synthesis note · 2026-05-18 · sourced from Foundation Models

Deep learning is the most powerful and most inscrutable member of the machine learning pantheon. Decades of attempts to put rigorous theoretical backing behind it have produced fragments — solvable toy models, scaling laws, hyperparameter limits, universal behaviors — but no unified frame. The argument in There Will Be a Scientific Theory of Deep Learning is that these fragments are not isolated; they are converging into a single discipline that the authors call learning mechanics.

Five strands point at the unification: (1) solvable idealized settings provide intuition for realistic systems, (2) tractable limits reveal fundamental phenomena, (3) simple mathematical laws capture macroscopic observables, (4) hyperparameter theories disentangle which parameters drive behavior, and (5) universal behaviors across systems clarify which phenomena need explanation. Each of these mirrors a move that classical, continuum, statistical, or quantum mechanics made for physical systems. The analogy is structural, not rhetorical: both fields develop libraries of solvable settings, both work with aggregate statistics rather than per-particle motion, both treat system parameters as first-class objects, and both encounter universality across regimes.

The methodological consequence is sharp. Learning mechanics aims at average-case predictions over rigorous worst-case bounds. This is a distinct epistemic project from learning theory's PAC-style guarantees and from interpretability's per-circuit causal accounts. It is concerned with what happens during training, with dynamics rather than endpoints, and with phenomena that are robust across architecture and dataset choices.

The paper anticipates a complementary relationship with mechanistic interpretability — "where mechanistic interpretability aims to be the biology of deep learning, learning mechanics should aspire to be its physics." Mech interp dissects specific circuits in specific models; learning mechanics characterizes the dynamics any sufficiently large network exhibits during training. Both are necessary; neither is sufficient alone.

Inquiring lines that read this note 10

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

How does policy entropy collapse constrain reasoning-focused reinforcement learning?

What stability techniques prevent collapse in policy-critic adversarial training?

What limits mechanistic interpretability's ability to characterize models?

How can identical external performance mask different internal representations?

Why should deep learning theory prioritize average-case over worst-case analysis?

Why do continual learning scenarios trigger catastrophic forgetting and interference?

How do self-generated feedback mechanisms enable effective model learning?

What distinguishes surface mechanisms from the training regimes that produce them?

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

15 direct connections · 129 in 2-hop network ·medium cluster Open in graph ↗

Can deep learning theory unify around training d… Do language models understand in fundamentally dif… Can cognitive science methods unlock how LLMs actu… Can humans understand deep learning before AI does… Why do Shannon and Kolmogorov measures fail to val…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Do language models understand in fundamentally different ways? Does mechanistic evidence reveal distinct tiers of understanding in LLMs—from concept recognition to factual knowledge to principled reasoning? And do these tiers coexist rather than replace each other?
adjacent: how to read mechanistic evidence at multiple levels
Can cognitive science methods unlock how LLMs actually work? Does Marr's three-level framework—developed to understand biological minds—offer interpretability researchers the structured methodology they need to decode opaque language models?
adjacent framing: cognitive science methods for LLM interpretation; learning mechanics is the dynamics-of-training axis Marr's framework does not directly address
Can humans understand deep learning before AI does? Explores whether investing in human-parseable deep learning theory remains valuable even if AI systems eventually develop their own self-understanding. Centers on why this matters for safety oversight.
same paper, the safety argument that motivates pursuing the theory now
Why do Shannon and Kolmogorov measures fail to value data? Shannon information and Kolmogorov complexity assume unlimited computational capacity. But do these classical measures actually capture what bounded learners can extract from real data?
exemplifies: the compute-aware average-case turn applied to information theory

Can deep learning theory unify around training dynamics?

Inquiring lines that read this note 10

Related concepts in this collection 4

Related papers in this collection 8

Search by related questions 5