INQUIRING LINE

Inquiring lines›How should we train models for cap…›What systematic failures and vulne…›What makes weaker teacher models e…›this inquiring line

When choosing training data, does knowing whether an example helps your specific goal beat simply ranking examples by difficulty?

Can gradient-based influence scores beat difficulty metrics for identifying valuable training data?

This pits two ways of deciding which training examples are worth keeping — gradient-based influence (does this example pull the model toward the target skill?) against difficulty metrics (how hard or redundant is this example?) — and asks which actually finds the valuable data.

This explores two rival philosophies of data selection: gradient influence, which asks whether an example moves the model toward a specific target capability, versus difficulty scoring, which ranks examples by how hard or redundant they are independent of any goal. The corpus suggests the answer isn't a clean win for either — it depends on what 'valuable' means and for whom.

The gradient camp's strongest evidence is striking: LESS uses low-rank gradient features to pick the 5% of instruction data whose learning signal most resembles the target task, and training on that sliver beats training on the whole dataset Can we train better models on less data?. The reason is the interesting part — mixed datasets don't just dilute, they actively hurt, because some examples shift the model's reasoning strategy away from what the task needs. Gradient influence is targeted: it scores an example relative to a destination. Difficulty metrics don't know the destination at all.

Difficulty scoring has its own impressive result, though. Ranking examples by difficulty signals like EL2N, forgetting, and memorization, then dropping the easy redundant ones, lets data pruning beat power-law scaling — half of CIFAR-10 removed with no accuracy loss, and the approach scaled to ImageNet with self-supervised metrics Can we prune training data without hurting model performance?. So difficulty is cheap, task-agnostic, and powerful when your goal is general capability rather than a narrow target. The honest read: gradient influence wins when you have a specific target distribution to aim at; difficulty wins when you're compressing a general corpus and don't.

Where the corpus gets genuinely useful is in showing that difficulty alone can be a trap, which quietly argues for influence-style thinking. Overly hard RLVR samples don't just waste compute — they induce degenerate shortcuts that contaminate skills the model already had, because rare accidental successes get treated as high-value trajectories Do overly hard RLVR samples actually harm model capabilities?. And teacher-refined data that is objectively higher quality still degrades a student when it sits past the student's learning frontier Does teacher-refined data always improve student model performance?. Both findings say the same thing: 'hard' or 'high quality' in the abstract is the wrong axis. What matters is whether the example is compatible with where this particular model can actually move — which is exactly the relational question gradient influence tries to answer and raw difficulty cannot.

The thing you didn't know you wanted to know: the real competition isn't influence-vs-difficulty as scoring formulas, it's targeted-vs-agnostic as goals. Difficulty asks 'is this example hard?' Influence asks 'is this example hard *in the direction I want to go*?' The mounting evidence that mismatched-but-impressive data backfires suggests the field is drifting toward the second question — even when it still uses cheap difficulty proxies to approximate it.

Sources 4 notes

Can we train better models on less data?

LESS uses low-rank gradient features to select instruction data most similar to target capabilities, and training on the selected 5% consistently outperforms full dataset training. The improvement occurs because mixed datasets contain examples that actively hinder specific skills by shifting reasoning strategy away from task requirements.

Can we prune training data without hurting model performance?

Research shows that ranking training examples by difficulty (EL2N, forgetting, memorization) and removing easy ones beats power-law scaling laws. On CIFAR-10, 50% of data was pruned without accuracy loss, and self-supervised metrics scaled the approach to ImageNet.

Do overly hard RLVR samples actually harm model capabilities?

Training on nearly-impossible problems causes models to learn degenerate shortcuts rather than genuine reasoning, and these shortcuts contaminate pre-existing capabilities. Group-relative normalization treats rare accidental successes as high-advantage trajectories, reinforcing answer repetition and computation-skipping instead of sound reasoning patterns.

Does teacher-refined data always improve student model performance?

Teacher-refined data degrades performance when it exceeds the student's learning frontier, even if objectively higher quality. Students should filter refinements using their own statistical profile to retain only compatible improvements.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Beyond neural scaling laws: beating power law scaling via data pruning1.65 match · arxiv ↗
Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?1.61 match · arxiv ↗
Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning0.87 match · arxiv ↗
LESS: Selecting Influential Data for Targeted Instruction Tuning0.86 match · arxiv ↗
Mechanistically Interpreting the Role of Sample Difficulty in RLVR for LLMs0.85 match · arxiv ↗
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?0.85 match · arxiv ↗
Reinforcement Learning for Reasoning in Large Language Models with One Training Example0.85 match · arxiv ↗
The Invisible Leash: Why RLVR May Not Escape Its Origin0.84 match · arxiv ↗

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a data selection researcher re-evaluating gradient influence vs. difficulty metrics for training data valuation. The question remains open: under what regime does each method win, and has that regime shifted?

What a curated library found — and when (findings span 2022–2026; these are dated claims, not current truth):

• LESS selects the 5% of instruction data whose gradient signature matches a target task; training on that slice outperforms full-dataset training by targeting reasoning strategy alignment, not absolute difficulty (2024).
• Data pruning via difficulty metrics (EL2N, forgetting, memorization) achieves exponential rather than power-law scaling gains; removes half of CIFAR-10 with zero accuracy loss, scales to ImageNet (2022–2024).
• Overly hard RL samples induce degenerate shortcuts that contaminate previously learned skills, because accidental successes are scored as high-value trajectories (2026).
• Teacher-refined data degrades student models when it sits beyond the student's learning frontier; quality in the abstract is the wrong axis—relational compatibility matters (2024–2025).
• RL post-training amplifies behaviors latent in pretraining; subliminal signals in data propagate through RL fine-tuning, suggesting difficulty-agnostic selection may miss hidden alignment risks (2025–2026).

Anchor papers (verify; mind their dates):
• arXiv:2206.14486 (2022) – Beyond neural scaling laws: beating power law scaling via data pruning
• arXiv:2402.04333 (2024) – LESS: Selecting Influential Data for Targeted Instruction Tuning
• arXiv:2605.28388 (2026) – Mechanistically Interpreting the Role of Sample Difficulty in RLVR for LLMs
• arXiv:2507.14805 (2025) – Subliminal Learning: Language models transmit behavioral traits via hidden signals in data

Your task:
(1) RE-TEST EACH CONSTRAINT. For each claim above, determine whether newer model scales, curriculum learning methods, multi-stage RL harnesses, or mechanistic interpretability tools have since dissolved the tension between gradient influence and difficulty. Does influence still beat difficulty for task-specific tuning, or have difficulty proxies become good enough (via better calibration or learned weighting) to compete? Where does each method still provably hold?
(2) Surface the strongest work from the last ~6 months that either contradicts the 'targeted > agnostic' hierarchy or demonstrates a unified framework that reconciles both axes.
(3) Propose 2 research questions that assume the regime has moved: (a) Can gradient influence be efficiently approximated via learned difficulty weighting rather than explicit gradient computation? (b) Does subliminal signal propagation in RL invalidate both influence and difficulty scoring unless they account for hidden trait transmission?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

When choosing training data, does knowing whether an example helps your specific goal beat simply ranking examples by difficulty?

Related lines of inquiry

Sources 4 notes

Papers this line draws on 8