INQUIRING LINE

Inquiring lines›How do language models construct a…›How does AI persuasion undermine h…›Why do continual learning scenario…›this inquiring line

The penalty that shrinks AI weights gets blamed for making models grip training data — but is it actually the cause?

Does weight decay directly cause contractive behavior near training examples?

This explores whether weight decay — the L2 penalty added during training — is itself the mechanism that makes a model collapse onto, or become locally rigid around, the examples it was trained on, versus that contractive behavior arising from other forces.

This reads the question as asking whether weight decay is the *direct cause* of a model becoming contractive — locally flattened or rigid — near its training data. Up front: the corpus doesn't contain a note that isolates weight decay and tests it as the causal lever, so a clean yes/no isn't available here. What the corpus does offer is a striking lateral reframing: several of the strongest results show contraction-like behavior emerging *without any explicit regularizer at all*, which complicates the premise that weight decay is doing the work.

The sharpest example is the finding that RL fine-tuning updates only 5–30% of parameters in sparse-but-nearly-full-rank subnetworks, and does so with no explicit regularization term in the loss Does reinforcement learning update only a small fraction of parameters?. That those updates are nearly identical across random seeds points to structural pressure baked into the optimization dynamics themselves — not into a decay penalty. If you're asking whether a regularizer is *required* to get tight, concentrated, low-dimensional changes near the data, this says no: the geometry shows up on its own.

The corpus is also rich on the broader phenomenon of training pressure pulling a model inward toward a narrow region — which is the behavior people often attribute to over-regularization. RL post-training converges onto a single dominant pretraining format and suppresses the alternatives within the first epoch Does RL training collapse format diversity in pretrained models?; positive-only reinforcement degrades diversity by concentrating probability mass, while negative reinforcement preserves it Does negative reinforcement alone outperform full reinforcement learning?; and SFT-then-RL on divergent expert data ends in an overfit phase after readaptation Why does SFT-then-RL training follow a predictable three-phase pattern?. None of these collapses is driven by weight decay — they're driven by the reward signal, the data mix, and the update rule. That's the lateral takeaway: contraction near training examples in these systems is mostly a *dynamics* story, not a *penalty* story.

There's also a counter-lever worth knowing about, because it's the closest thing the corpus has to a knob that governs how far a model is allowed to move: KL drift from the base model. Keeping drift low (staying close to the base distribution) preserves plasticity and the ability to keep learning, whereas large parameter-only moves cause models to stall when the domain shifts Does staying close to the base model preserve learning ability?. This is the inverse framing of your question: rather than weight decay forcing local rigidity, *too much* unconstrained movement is what destroys adaptability — and a soft constraint toward the base is what keeps the model supple. So if you came looking for 'is the regularizer the villain,' the corpus gently flips it: the more documented failure mode is uncontrolled drift, with structured contraction often emerging on its own.

Sources 5 notes

Does reinforcement learning update only a small fraction of parameters?

Across seven RL algorithms and ten LLM families, RL induces intrinsic parameter sparsity of 5–30% without explicit regularization. Critically, these sparse updates are nearly full-rank and nearly identical across random seeds, indicating structural rather than arbitrary parameter selection.

Does RL training collapse format diversity in pretrained models?

Controlled experiments show RL consistently amplifies one format distribution from pretraining within the first epoch while collapsing alternatives. The winning format depends on model scale, not necessarily performance, and is largely hidden when starting from proprietary pretrained models.

Does negative reinforcement alone outperform full reinforcement learning?

Training with only negative samples consistently improves Pass@k across the spectrum, often matching full PPO and GRPO. Negative reinforcement suppresses incorrect trajectories while preserving diversity, whereas positive-only reinforcement degrades higher-k performance by concentrating probability mass.

Why does SFT-then-RL training follow a predictable three-phase pattern?

CHORD identifies three distinct training phases: initial capability disruption from policy shift, readaptation to expert patterns, then overfitting. Dynamically weighting SFT as an auxiliary objective within on-policy RL resolves this progression and improves stability.

Does staying close to the base model preserve learning ability?

FST-trained models stay up to 70% closer to their base distribution than parameter-only RL, and this reduced drift preserves the model's ability to learn subsequent tasks effectively. Parameter-only approaches stall when task domains change, while low KL drift enables sustained adaptation.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training2.45 match · arxiv ↗
Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining1.71 match · arxiv ↗
Learning, Fast and Slow: Towards LLMs That Adapt Continually1.67 match · arxiv ↗
The Art of Scaling Reinforcement Learning Compute for LLMs1.67 match · arxiv ↗
Mechanistically Interpreting the Role of Sample Difficulty in RLVR for LLMs1.65 match · arxiv ↗
Reinforcement Learning for Reasoning in Large Language Models with One Training Example1.65 match · arxiv ↗
Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!1.61 match · arxiv ↗
Reinforcement Learning Finetunes Small Subnetworks in Large Language Models0.90 match · arxiv ↗

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a mechanistic analyst re-testing whether weight decay is the *direct causal lever* for contraction near training examples in LLMs—a claim that a curated library (2025–2026) treats as uncertain, even as contraction itself is well-documented.

What a curated library found — and when (dated claims, not current truth):
— RL fine-tuning updates only 5–30% of parameters in sparse subnetworks with NO explicit regularization, yet updates are identical across seeds, implying structural (not penalty-driven) pressure (arXiv:2505.11711, ~2025).
— RL post-training converges onto a single dominant pretraining format within one epoch; diversity collapse is driven by reward signal and data mix, not weight decay (arXiv:2504.07912, ~2025).
— Lower KL drift from the base model *preserves* plasticity and prevents stalling under domain shift—the inverse framing: uncontrolled movement, not regularization, is the failure mode (arXiv:2605.12484, ~2026).
— SFT-then-RL on divergent data exhibits overfit collapse in a readaptation phase, independent of weight decay (arXiv:2508.11408, ~2025).

Anchor papers (verify; mind their dates):
— arXiv:2505.11711 (2025): Subnetwork sparsity in RL updates
— arXiv:2504.07912 (2025): RL amplification of pretraining behaviors
— arXiv:2605.12484 (2026): KL drift and continual learning
— arXiv:2508.11408 (2025): SFT–RL harmonization and failure modes

Your task:
(1) RE-TEST THE PREMISE: Does the latest evidence (last 6 months) still separate contraction-via-weight-decay from contraction-via-dynamics? For each finding above, determine whether newer model scales, training regimes (e.g., scaled RL, multi-task SFT), or mechanistic probes have since revealed weight decay *does* play a direct causal role, or whether the dynamics story has only strengthened. Cite what changed.
(2) SURFACE CONTRADICTING WORK: Find the strongest recent paper (if any) that re-centers weight decay as a primary driver of local rigidity, or that challenges the "no-regularizer-still-contracts" claim.
(3) PROPOSE 2 FORWARD QUESTIONS: (a) Under what scaling regime (model size, RL iterations, reward model confidence) does the geometry flip—does weight decay become necessary to prevent divergence rather than to cause contraction? (b) Can you design an intervention (e.g., layer-wise weight decay tuning, or drift-aware annealing) that decouples the penalty from the dynamics story?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

The penalty that shrinks AI weights gets blamed for making models grip training data — but is it actually the cause?

Related lines of inquiry

Sources 5 notes

Papers this line draws on 8