SYNTHESIS NOTE
Model Architecture and Internals

Can learnable spline activations beat fixed MLP designs?

What if neural networks moved nonlinearity from fixed node activations to learnable functions on edges? This explores whether such a structural redesign could improve accuracy, interpretability, and scaling compared to standard MLPs.

Synthesis note · 2026-06-03 · sourced from MechInterp

MLPs — fixed activations on nodes, linear weights on edges — are the default nonlinear approximator and the bulk of a transformer's non-embedding parameters, yet they are hard to interpret. Inspired by the Kolmogorov-Arnold representation theorem, KANs invert the design: no linear weights at all — every weight is a learnable univariate function (a spline) on an edge, and there are no fixed node activations. This seemingly small change yields three claims: much smaller KANs match or beat much larger MLPs on data fitting and PDE solving; KANs obey faster neural scaling laws; and they are interpretable — visualizable and able to act as "collaborators" helping scientists rediscover mathematical and physical laws.

The keeper is the architectural bet: moving nonlinearity from nodes to learnable edge-functions trades the MLP's opacity for a structure you can inspect and that scales better — a genuine alternative to the MLP monoculture, at least in science-adjacent regimes (the paper is candid that deep-KAN theory is still thin).

This sits in the vault's architecture/interpretability thread as a structural alternative. It rhymes with the inductive-bias-over-capacity lesson of Why does dot product beat MLP-based similarity in practice? — the right structural prior beats raw MLP capacity — and offers an interpretability-by-construction contrast to post-hoc methods like Can dictionary learning scale to production language models?.

Inquiring lines that use this note as a source 2

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 2

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
12 direct connections · 98 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

Kolmogorov-Arnold Networks put learnable spline activations on edges and beat MLPs on accuracy interpretability and scaling