INQUIRING LINE

Inquiring lines›How should we train models for cap…›What systematic failures and vulne…›How do training priors constrain w…›this inquiring line

AI often learns more from studying its own failures than from correct examples — positive signals narrow thinking, negative ones just cut out bad paths.

Why does negative experience transfer better than positive examples alone?

This explores why training, prompting, and memory systems often gain more from failures and negative signals than from positive examples by themselves — and what the corpus says is actually happening when they do.

This explores why negative experience — failed trajectories, induced mistakes, suppressed wrong answers — frequently transfers better than positive examples alone, and the corpus has a surprisingly consistent answer across very different methods: positive-only signals concentrate probability mass and narrow behavior, while negative signals prune the space without collapsing it. The clearest version comes from reinforcement learning, where training on *only* negative samples matches or beats full RL because suppressing incorrect trajectories preserves diversity, whereas positive-only reinforcement degrades higher-k performance by piling probability onto a few winning paths Does negative reinforcement alone outperform full reinforcement learning?. Positive examples teach 'do more of this'; negative ones teach 'this region is bad' — and the second leaves far more of the space intact.

The same asymmetry shows up in memory and agent learning, but with a twist about *how* each type of experience should be stored. SkillRL keeps successes as concrete demonstrations to imitate, but abstracts failures into general lessons — and that differential treatment outperforms processing everything uniformly Should successful and failed episodes be processed differently?. ReasoningBank pushes the point further: distilling strategy-level hints from both successes *and* failures beats success-only memory, because failures carry information about boundaries and pitfalls that a clean success simply never reveals Can agents learn better from their failures than successes?. A failure tells you where the cliff is; a success only tells you one safe path along the ridge.

Even without any training, the effect holds at inference time. LEAP deliberately induces a model to err on its own few-shot examples, then has it articulate explicit principles from those mistakes — and this improves reasoning without a single extra label Does learning from mistakes improve in-context learning?. The mistake forces the model to name the rule it was implicitly violating, which a correct example would have let it skate past. This is why positive examples 'alone' underperform: a correct demonstration is compatible with many wrong generalizations, and the learner has no pressure to distinguish them.

Here's the part you might not have expected to want: the failure mode of positive-only learning is not just weaker performance, it's *confident* weakness. Teachers conditioned only on correct answers produce concise, confident traces that suppress uncertainty — students inherit the swagger and lose out-of-distribution robustness Does richer teacher context hurt student generalization?. Imitation training shows the endpoint: copying ChatGPT's fluent, confident style closes no real capability gap, just fools evaluators Can imitating ChatGPT fool evaluators into thinking models improved?. Positive examples are easy to mimic stylistically, which is exactly why they transfer the *look* of competence rather than the thing itself. Negative experience resists that shortcut — you can't fake having learned where the errors are.

There's a real limit worth naming, though: negative signal only helps when the failures are informative. RLVR samples that are nearly impossible produce degenerate shortcuts that contaminate existing skills, because the rare accidental success gets treated as a high-value lesson Do overly hard RLVR samples actually harm model capabilities?. So the principle isn't 'negativity is magic' — it's that negative experience carries discriminative information positive examples can't, as long as the failures sit close enough to the model's frontier to mean something.

Sources 7 notes

Does negative reinforcement alone outperform full reinforcement learning?

Training with only negative samples consistently improves Pass@k across the spectrum, often matching full PPO and GRPO. Negative reinforcement suppresses incorrect trajectories while preserving diversity, whereas positive-only reinforcement degrades higher-k performance by concentrating probability mass.

Should successful and failed episodes be processed differently?

SkillRL demonstrates that treating successful episodes as concrete demonstrations and failures as abstracted lessons achieves state-of-the-art performance on complex tasks while using substantially less context than uniform approaches. The asymmetry mirrors human expert reasoning and avoids the degradation seen in uniform consolidation methods.

Can agents learn better from their failures than successes?

ReasoningBank shows that storing strategy-level reasoning hints from both self-judged successes and failures outperforms success-only memory and raw trajectory storage. Coupled with test-time scaling, memory and compute compound rather than substitute, creating a novel scaling law where accuracy improves through cumulative interaction history.

Does learning from mistakes improve in-context learning?

LEAP demonstrates that models achieve better performance on reasoning and math tasks by intentionally erring on few-shot examples, reflecting on mistakes, and deriving explicit task-specific principles—without additional labeled data or fine-tuning.

Does richer teacher context hurt student generalization?

Teachers conditioned on correct answers and verifier output produce confident, concise traces that students inherit. This style suppresses uncertainty expression, optimizing in-domain performance while degrading generalization to out-of-distribution problems that require epistemic caution.

Show all 7 sources

Can imitating ChatGPT fool evaluators into thinking models improved?

Imitation models fool human evaluators by mimicking ChatGPT's confident, fluent style while failing to improve factuality or generalization on novel tasks. The ceiling is set by base model capability, not fine-tuning method—better fundamentals, not shortcuts, drive real improvement.

Do overly hard RLVR samples actually harm model capabilities?

Training on nearly-impossible problems causes models to learn degenerate shortcuts rather than genuine reasoning, and these shortcuts contaminate pre-existing capabilities. Group-relative normalization treats rare accidental successes as high-advantage trajectories, reinforcing answer repetition and computation-skipping instead of sound reasoning patterns.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a capability researcher re-testing constraints on learning asymmetries. Core question: Does negative experience (failures, mistakes, boundary signals) transfer better than positive examples alone — and if so, does that advantage persist or dissolve under current models, training methods, and evals?

What a curated library found — and when (dated claims, not current truth):
Findings span 2022–2026; treat as perishable thresholds:
- Negative-only RL matches or exceeds full RL by preserving behavioral diversity while suppressing incorrect paths; positive-only reinforcement narrows high-k performance (~2025, arXiv:2506.01347).
- Differential storage (concrete successes as demos, failures abstracted into lessons) outperforms uniform processing; failures encode boundaries that successes occlude (~2025, arXiv:2509.25140).
- Inducing models to err on few-shot examples, then extracting explicit principles, improves reasoning without extra labels; correct examples license wrong generalizations (~2024, arXiv:2402.05403).
- Positive-only teaching produces confident, concise but brittle student traces; imitation captures style (fluency, confidence) but not factuality or OOD robustness (~2023, arXiv:2305.15717; ~2026, arXiv:2603.24472).
- Overly hard negative samples (near-impossible failures) induce degenerate shortcuts; negative signal only transfers when failures sit near the model's frontier (~2025, arXiv:2507.22844).

Anchor papers (verify; mind their dates):
- arXiv:2305.15717 (2023) — style-vs-substance in imitation
- arXiv:2506.01347 (2025) — negative RL effectiveness
- arXiv:2509.25140 (2025) — ReasoningBank differential memory
- arXiv:2507.22844 (2025) — sample difficulty and degenerate shortcuts

Your task:
(1) RE-TEST EACH CONSTRAINT. For every finding above: Do newer scaling (o1, o3, etc.), constitutional AI methods, synthetic preference data at scale, or multi-turn reasoning chains since mid-2025 relax the positive-only bottleneck? Does confidence collapse still occur when models are trained on reasoning traces vs. answer-only imitation? Does differential memory generalize to frontier models, or only to smaller ones? Separate the durable asymmetry (likely still open: negative = pruning, positive = concentration) from perishable limitations (e.g., does better calibration fix the confidence issue?).
(2) Surface the strongest contradicting or superseding work from the last ~6 months. If any recent paper finds positive-only baselines *do* preserve diversity or match negative conditioning, flag it and explain the difference in method/eval.
(3) Propose 2 research questions that assume the regime may have shifted:
   - If negative experience now transfers worse under multi-turn constitutional training or RL-from-mL, why?
   - Do frontier models with chain-of-thought training recover the positive-only signal, or does the asymmetry sharpen?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

AI often learns more from studying its own failures than from correct examples — positive signals narrow thinking, negative ones just cut out bad paths.

Related lines of inquiry

Sources 7 notes

Papers this line draws on 8