INQUIRING LINE

Inquiring lines›How should we train models for cap…›What systematic failures and vulne…›Why does finetuning cause catastro…›this inquiring line

How aggressively can you edit a skill document to teach an AI before it quietly loses the abilities it already had?

How tight should a textual learning rate be before it prevents skill escape?

This explores a question the corpus answers obliquely: when you optimize a skill in *text* rather than weights — editing a document the way you'd nudge parameters — how cautious must each edit be to keep the model from drifting away from skills it already had?

This reads the phrase 'textual learning rate' as the aggressiveness of edits when you optimize skills in language rather than in weights, and 'skill escape' as the drift or collapse where chasing a new gain quietly erodes what the model could already do. The corpus doesn't use these exact words, but it has a surprisingly direct answer hiding in two places. The clearest is SkillOpt, which treats a skill document like a set of weights and runs a separate optimizer that proposes edits — but accepts an edit *only* when it strictly improves a held-out validation score Can skill documents be optimized like neural network weights?. That's the answer to 'how tight': the validation gate *is* the learning rate. Edits are unbounded in ambition but the acceptance test is ruthless, so the effective step size is whatever survives a held-out check. Skill escape is prevented not by taking small steps but by rejecting steps that don't generalize.

The self-play side of the corpus shows what happens when you remove that brake. Ctx2Skill co-evolves skills through natural-language edits with no human supervision, and its authors are explicit that the whole loop only works when adversarial pressure is balanced against a 'generalization safeguard' — without it, the system collapses Can language models learn skills without human supervision?. So both text-space methods independently converge on the same shape: the danger isn't the size of any single edit, it's edits that optimize the training signal while quietly abandoning the broader skill. The tightness you need is exactly enough to catch that.

What makes this interesting is that the weight-space literature reaches the *same* conclusion through a completely different door, which suggests it's a property of learning, not of the medium. Staying close to the base model — low KL drift — preserves plasticity, the ability to keep learning later tasks; parameter-only RL that drifts hard stalls when the domain shifts Does staying close to the base model preserve learning ability?. 'KL drift from base' is the weight-space twin of your 'textual learning rate': how far you let yourself move from where you started. The skill-escape failure mode there is catastrophic, dressed as progress.

Then the corpus delivers the genuinely counterintuitive part — tighter is not always the answer, because the *floor* on a useful step is shockingly low. In RLVR, a single training example lifts math accuracy from 36% to 73.6%, and test accuracy keeps climbing for 1,400 steps after training accuracy already hit 100% Can a single training example unlock mathematical reasoning?. The lesson for textual optimization: a good edit doesn't *teach* a skill so much as *activate* a latent one, which means you can afford a very tight learning rate and still get large gains. You don't need aggressive edits to make progress — which is precisely why you can afford the strict validation gate that prevents escape.

So the synthesis: there's no single tightness number, because the corpus reframes the question. The right brake isn't a smaller step, it's a held-out acceptance test Can skill documents be optimized like neural network weights? plus a generalization safeguard Can language models learn skills without human supervision? — the text-space versions of staying near base Does staying close to the base model preserve learning ability? — and you can keep that brake tight precisely because activation, not aggression, is what makes edits pay off Can a single training example unlock mathematical reasoning?. If you want one more thread to pull, the context-integration work shows the failure from the other side: strong prior associations can simply override a new instruction, so a 'too-gentle' textual edit may not move the model at all Why do language models ignore information in their context?.

Sources 5 notes

Can skill documents be optimized like neural network weights?

SkillOpt demonstrates that skill documents can be systematically improved through a separate optimizer that proposes edits, accepting only changes that strictly improve held-out validation scores. This approach outperforms baselines across 52 experimental cells and produces skills that transfer between models.

Can language models learn skills without human supervision?

Ctx2Skill's three-role self-play loop manufactures missing feedback through internal signals: the Challenger escalates difficulty as curriculum, the Judge gives binary verdicts as reward, and both sides evolve via natural-language skill edits. Success requires balancing adversarial pressure against a generalization safeguard to prevent collapse.

Does staying close to the base model preserve learning ability?

FST-trained models stay up to 70% closer to their base distribution than parameter-only RL, and this reduced drift preserves the model's ability to learn subsequent tasks effectively. Parameter-only approaches stall when task domains change, while low KL drift enables sustained adaptation.

Can a single training example unlock mathematical reasoning?

A single example in RLVR boosts math performance from 36% to 73.6% and enables test accuracy to improve for 1,400 steps after training accuracy reaches 100%, revealing that minimal activation signals unlock latent reasoning capability.

Why do language models ignore information in their context?

Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are an LLM researcher auditing a synthesis claim about textual learning rates and skill escape. The claim: that validation gates (held-out checks) and generalization safeguards prevent drift better than step-size tuning, and that tight brakes work because skills activate rather than accumulate. A curated library on in-context learning and text-space optimization (2022–2026) supports this. Your job is to test whether that regime still holds.

What a curated library found — and when (dated claims, not current truth):
• SkillOpt: text-space skill edits stay generalized only under a held-out validation gate; no edit survives without strict held-out improvement (~2026).
• Ctx2Skill self-play co-evolution collapses without adversarial balancing + generalization safeguard; the brake is structural, not tunable (~2026).
• Weight-space RL: low KL drift from base model preserves plasticity for continual learning; high drift causes catastrophic stalling on domain shift (~2025–2026).
• RLVR: one training example lifts math accuracy to 73.6%; test accuracy climbs 1,400 steps post-saturation—activation, not aggression, drives gains (~2025).
• Context integration fails when prior associations override new instructions; 'too-gentle' edits may not move the model at all (~2025).

Anchor papers (verify; mind their dates):
• arXiv:2605.23904 SkillOpt (2026)
• arXiv:2604.27660 Ctx2Skill (2026)
• arXiv:2504.09522 Continual learning + KL drift (2025)
• arXiv:2504.20571 RLVR one-shot math (2025)

Your task:
(1) RE-TEST EACH CONSTRAINT. For validation gates, generalization safeguards, and KL-drift proxies in text space: have newer model scales, multi-agent orchestration, or long-horizon planning methods since RELAXED the need for held-out checks, or does the constraint tighten? For activation-vs-aggression: does in-context learning at scale still show the floor-effect (one example → large gains), or do scaling laws flatten it? Separate the durable insight (skills are latent, not learned) from the perishable mechanism (validation gates suffice). Cite what supersedes it.
(2) Surface the strongest work from the last ~6 months that contradicts the 'validation-gate-as-learning-rate' frame—e.g., adversarial robustness of text edits, or failure modes of held-out checks under distribution shift.
(3) Propose 2 research questions that assume the regime has moved: (a) Do multi-agent feedback loops eliminate the need for external validation? (b) Can adaptive validation thresholds scale skill-escape prevention to longer horizons?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

How aggressively can you edit a skill document to teach an AI before it quietly loses the abilities it already had?

Related lines of inquiry

Sources 5 notes

Papers this line draws on 8