INQUIRING LINE

Inquiring lines›How should we train models for cap…›What systematic failures and vulne…›Does fine-tuning modify underlying…›this inquiring line

RL fine-tuning edits just 5–30% of weights, and different runs always land on the same ones — suggesting RL finds structure, not installs it.

Why do sparse parameter subsets enable full-rank learning in RL?

This explores a surprising finding about RL fine-tuning: it rewrites only a small slice of a model's weights (5–30%), yet those edits aren't confined to a few directions — they're full-rank — and the corpus suggests this is because RL reorganizes capabilities the model already has rather than installing new ones.

This explores why RL's weight changes can be both sparse (touching few parameters) and full-rank (spanning many directions) at once — a combination that sounds contradictory until you look at what RL is actually doing inside the model. The anchor result is that across seven RL algorithms and ten model families, RL spontaneously updates only 5–30% of parameters without anyone asking it to, and those updates are nearly full-rank and nearly identical across random seeds Does reinforcement learning update only a small fraction of parameters?. The seed-stability is the tell: if RL were just nudging weights arbitrarily, different runs would pick different parameters. Instead they converge on the same subnetwork, which means the sparsity is structural — RL is finding a specific, pre-existing pathway, not carving a new one.

The 'why' comes into focus when you ask what RL is mechanistically doing. It turns out the dominant move is *suppression*, not amplification: RL mostly works by pushing down wrong trajectories rather than building up correct ones, following a two-phase arc of consolidating procedures first and exploring strategy second What actually changes inside a model during RL training?. Suppressing a behavior the model already knows requires adjusting many directions of the weight space (hence full-rank) but only at the specific parameters that gate that behavior (hence sparse). You don't need to retrain the whole network to change which of its existing tendencies wins.

That framing connects to a second corpus finding that's easy to miss: RL doesn't create format or reasoning behaviors so much as it *selects* among ones pretraining already deposited. RL reliably amplifies a single dominant pretraining distribution within the first epoch while collapsing the alternatives Does RL training collapse format diversity in pretrained models?. If the raw material is already in the base model, the learning problem reduces to routing — turning up one latent capability and turning down its competitors — which is exactly the kind of change a sparse-but-full-rank edit can express.

There's a deeper hint from how networks use sparsity natively. Models already represent unfamiliar or hard inputs sparsely and familiar ones densely, a structure learned during pretraining rather than imposed later Is representational sparsity learned or intrinsic to neural networks?, and they sharpen activations into an even sparser, localized pattern under difficult or out-of-distribution tasks as a stabilizing filter Do language models sparsify their activations under difficult tasks?. So sparse, targeted modification isn't foreign to these models — it's their default organizing principle. RL appears to ride that existing structure rather than fight it.

The payoff you might not have expected: this reframes RL fine-tuning as closer to interpretable surgery than blunt retraining. Work on deliberately training sparse weights shows that sparsity forces modular, human-readable circuits where ablation can prove a circuit is both necessary and sufficient for a task Can sparse weight training make neural networks interpretable by design?. If RL naturally concentrates its edits in a stable, sparse subnetwork, the same logic suggests those subnetworks may be the right unit for understanding — and eventually steering — what RL actually changed in a model.

Sources 6 notes

Does reinforcement learning update only a small fraction of parameters?

Across seven RL algorithms and ten LLM families, RL induces intrinsic parameter sparsity of 5–30% without explicit regularization. Critically, these sparse updates are nearly full-rank and nearly identical across random seeds, indicating structural rather than arbitrary parameter selection.

What actually changes inside a model during RL training?

RL's effects concentrate in structurally sparse but full-rank subnetworks across multiple algorithms and models. Suppressing wrong trajectories—rather than amplifying correct ones—appears to be the primary mechanism, with training following a predictable two-phase pattern of procedural consolidation then strategic exploration.

Does RL training collapse format diversity in pretrained models?

Controlled experiments show RL consistently amplifies one format distribution from pretraining within the first epoch while collapsing alternatives. The winning format depends on model scale, not necessarily performance, and is largely hidden when starting from proprietary pretrained models.

Is representational sparsity learned or intrinsic to neural networks?

During pretraining, neural networks develop dense activations for familiar training data and default to sparse representations for unfamiliar inputs. This trend emerges without task-specific fine-tuning and reflects how models consolidate knowledge through exposure.

Do language models sparsify their activations under difficult tasks?

As task difficulty increases, LLM hidden states become substantially sparser in a localized, systematic way that correlates with task unfamiliarity and reasoning load. This sparsification acts as a selective filter stabilizing performance under OOD shift rather than a failure mode.

Show all 6 sources

Can sparse weight training make neural networks interpretable by design?

Training transformers with sparse weights creates compact, human-interpretable circuits where neurons correspond to simple concepts with clear connections. Ablation studies confirm these circuits are necessary and sufficient for task performance, though scaling beyond tens of millions of parameters while maintaining interpretability remains unsolved.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are an RL mechanist evaluating whether sparse-but-full-rank parameter updates in LLM fine-tuning remain a binding constraint or have been architecturally or algorithmically dissolved. The question: why do RL subnetworks achieve full-rank updates while touching only 5–30% of parameters?

What a curated library found — and when (dated claims, not current truth):
Findings span 2024–2026; treat them as perishable claims to be re-tested:
  • Across seven RL algorithms and ten model families, RL spontaneously updates 5–30% of parameters in a stable, seed-consistent subnetwork with near-full-rank structure (arXiv:2505.11711, 2025-05).
  • RL training follows a two-phase arc: consolidation of suppression (turning down wrong trajectories) precedes exploration of new strategy; suppression is the dominant mechanistic move and requires full-rank edits across a sparse parameter set (arXiv:2505.11711, 2025-05).
  • RL post-training converges on a single dominant pretraining distribution within the first epoch, collapsing alternatives; this selection-not-creation pattern suggests routing among latent capabilities rather than building new ones (arXiv:2504.07912, 2025-04).
  • Native representational sparsity in models—learned during pretraining—is sharpened further under OOD or difficult tasks as an adaptive filter; sparse, modular structure is the model's default (arXiv:2603.03415, 2026-03; arXiv:2507.07186, 2025-07).
  • Weight sparsity naturally produces interpretable, modular circuits where ablation grounds necessity and sufficiency; RL's sparse subnetworks may inherit this interpretability (arXiv:2511.13653, 2025-11).

Anchor papers (verify; mind their dates):
  • arXiv:2505.11711 (Reinforcement Learning Finetunes Small Subnetworks in Large Language Models, 2025-05)
  • arXiv:2504.07912 (Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining, 2025-04)
  • arXiv:2511.13653 (Weight-sparse transformers have interpretable circuits, 2025-11)
  • arXiv:2603.03415 (Farther the Shift, Sparser the Representation, 2026-03)

Your task:
  (1) RE-TEST EACH CONSTRAINT. For the 5–30% sparsity claim, full-rank-ness of updates, and seed-stability across subnetworks: has any newer scaling regime, longer training run, different compute allocation per step, or new hardware (e.g., sparse tensor engines) relaxed the sparsity bottleneck or forced denser updates? Has any recent RL algorithm (e.g., scaled PPO, expert-mixture or mixture-of-experts routing) systematized routing better, making the sparse subnetwork finding moot? Distinguish the durable question (what mechanism selects which parameters?) from the perishable claim (updates must remain 5–30% sparse). Cite what changed it.
  (2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months. Look for papers claiming RL requires dense updates, full retraining, or that pretraining-to-RL routing is not the binding mechanism. Flag disagreement on whether suppression or amplification is dominant.
  (3) Propose 2 research questions that ASSUME the regime may have shifted: (a) If sparse subnetworks become saturated or can no longer express new RL objectives without invoking denser parameter sets, what is the scaling law for RL expressivity vs. sparsity? (b) If routing among pretraining modes is the true mechanism, can you directly control which pretraining distribution RL amplifies, and would that be more efficient than running RL?

RL fine-tuning edits just 5–30% of weights, and different runs always land on the same ones — suggesting RL finds structure, not installs it.

Related lines of inquiry

Sources 6 notes

Papers this line draws on 8