Do larger models develop more abstract features than smaller ones?
This explores whether scaling up model size actually grows more abstract internal representations — and the corpus complicates a simple 'yes' by separating where abstraction lives from how big the model is.
This explores whether bigger models develop more abstract features than smaller ones. The most direct evidence says yes: circuit tracing inside Claude models reveals a four-tier hierarchy — token-level inputs, then abstract concepts, then functional operations, then outputs — and larger models develop richer features in those upper, more abstract tiers, suggesting scale buys higher-level conceptual reasoning rather than just more memorized patterns How do language models organize features across processing layers?. So abstraction does seem to track size, at least in this layered sense.
But the corpus immediately pushes back on the assumption that size is the *cause*. Abstraction appears to come from depth, not raw parameter count: at the sub-billion scale, deep-and-thin architectures beat wide ones precisely because stacking more layers lets a model *compose* abstract concepts through them, rather than spreading capacity sideways Does depth matter more than width for tiny language models?. That reframes the question — it's the number of processing stages an idea passes through that builds abstraction, and bigger models tend to be deeper, so they get more of it almost as a side effect.
There's also a sharp warning against reading 'more abstract' off of performance numbers. A model can hit perfect accuracy while its internal organization is fractured and broken — the features needed for the task are linearly decodable, but the underlying structure is fragile and invisible to standard metrics Can models be smart without organized internal structure?. So a smaller model that scores well isn't necessarily organizing concepts cleanly, and a bigger one scoring better isn't proof of richer abstraction either. This pairs with the finding that the famous 'emergent abilities' of large models are often metric artifacts: switch from a harsh pass/fail metric to a continuous one and the sudden capability jumps smooth into gradual, predictable improvement Are LLM emergent abilities real or measurement artifacts?. Abstraction with scale may grow steadily, not in dramatic leaps.
The most surprising thread: abstraction can be added without scaling at all. A 1.5B model with only a lightweight LoRA adapter matched much larger RL-trained models on reasoning, implying that what 'reasoning training' teaches is often output *format* and organization rather than new knowledge — and that the machinery for abstract reasoning and the store of factual knowledge are separable Can small models reason well by just learning output format?. Relatedly, abstractions can be trained as an explicit object: jointly generating abstractions and solutions creates structured breadth-first exploration that small-budget depth-only chains can't reach Can abstractions guide exploration better than depth alone?. Abstraction, in other words, is a skill you can install, not only a property that emerges from mass.
And bigger isn't strictly better even on its own turf. For generating diverse outputs, ~500M-parameter models beat larger ones, because large models concentrate probability mass and collapse variety Why aren't bigger models better for generating diverse outputs?. The takeaway you didn't know you wanted: larger models do appear to build more abstract features, but that's downstream of depth and composition, it shows up smoothly rather than as magic emergence, and it can be partly grafted onto small models through architecture and training — so 'abstract' and 'big' are correlated, not the same thing.
Sources 7 notes
Circuit tracing in Claude models reveals features progress from token-level inputs to abstract concepts to functional operations to outputs. Larger models develop richer abstract features, suggesting scaling enables higher-level conceptual reasoning rather than pattern memorization.
MobileLLM shows deep-and-thin architectures yield 2.7–4.3% accuracy gains over balanced designs at 125M–350M scale by composing abstract concepts through layers rather than spreading parameters across width.
Models trained with SGD can contain all the linearly decodable features needed for a task while maintaining fundamentally broken internal organization. This makes them vulnerable to perturbation and distribution shift invisible to standard evaluation metrics.
Sharp, unpredictable capability transitions vanish when using continuous metrics instead of discontinuous ones. The same model outputs show smooth predictable improvement with scale, suggesting emergence is a measurement choice rather than a real behavioral change.
A 1.5B parameter model with LoRA-only post-training matched larger full-parameter RL models on reasoning tasks, suggesting RL teaches output format organization rather than new factual knowledge. This efficiency indicates reasoning and knowledge storage are separable capabilities.
RLAD jointly trains abstraction and solution generators, showing that allocating test-time compute to diverse abstractions outperforms parallel solution sampling at large budgets. Abstractions create structured breadth-first exploration that prevents the underthinking failure mode of depth-only reasoning chains.
Research shows that for synthetic data generation, models around 500M parameters outperform larger ones in output diversity per sample. Larger models concentrate probability mass on preferred outputs, reducing the variety of distinct samples generated within a fixed budget.