INQUIRING LINE

Do weight-space skills lose detail compared to textual skill descriptions?

This explores whether compiling agent skills into model weights (LoRA adapters, hidden-state interventions) throws away the richness of the same skill written out as plain-text instructions — and what each form is actually good at.


This explores whether turning a skill into weights costs you something a written skill document keeps. The honest answer from the corpus is: it's less a loss of detail than a change in what the skill can *do* — each form is strong where the other is weak, and the interesting question is what gets traded.

Start with the case for weights. Can skills work better as weights than as prompts? shows that compiling textual skills into LoRA adapters cuts prompt overhead by 64–72% while matching or beating the in-context (text) version — and unlocks something text can't do at all: parameter arithmetic, where skills become composable objects you can add and scale. So on raw task performance the weight-space skill isn't losing detail; it's shedding token cost. What it loses is legibility — you can read, audit, and hand-edit a text skill; a LoRA adapter is opaque.

That legibility turns out to be the real prize of the text form. Can skill documents be optimized like neural network weights? makes the striking point that a skill *document* can be optimized as rigorously as weights — a separate optimizer proposes edits and keeps only those that improve held-out validation — and the resulting skills transfer between models. A weight-space skill is baked into one model's parameters; the text skill is portable. So the 'detail' a text description holds isn't just nuance, it's transferability and inspectability.

Now the surprising twist, which is where this gets worth knowing: text skill descriptions may carry less semantic detail than you'd think. Does instruction tuning teach task understanding or output format? found that models trained on semantically empty or even *wrong* instructions perform about as well as those given full correct ones — what transfers is knowledge of the output space, not the meaning of the words. If much of an instruction's payload is format rather than understanding, then compiling it to weights isn't discarding rich semantics; it's just relocating the part that was actually doing the work. This reframes the whole question: the 'detail' in a textual skill is partly an illusion of richness.

The deeper trade-off is about *where* you intervene. Can editing hidden representations beat weight updates for finetuning? and Can decoding-time tuning preserve knowledge better than weight fine-tuning? both warn that baking changes directly into weights can corrupt stored knowledge — proxy-tuning deliberately leaves base weights untouched to avoid exactly that, and ReFT edits frozen representations rather than weights for the same reason. So the risk with weight-space skills isn't that they lose the skill's detail — it's that the act of writing into weights can damage detail the model already had. The lesson across these notes: text skills win on legibility, portability, and not touching the base model; weight skills win on cost and composability; and the semantic 'loss' people fear is smaller than the practical trade-offs they overlook.


Sources 5 notes

Can skills work better as weights than as prompts?

LatentSkill uses a hypernetwork to convert textual agent skills into plug-and-play LoRA adapters, reducing prefill tokens by 64–72% while maintaining or beating in-context baselines. Weight-space skills form composable semantic structures that can be scaled and combined through parameter arithmetic.

Can skill documents be optimized like neural network weights?

SkillOpt demonstrates that skill documents can be systematically improved through a separate optimizer that proposes edits, accepting only changes that strictly improve held-out validation scores. This approach outperforms baselines across 52 experimental cells and produces skills that transfer between models.

Does instruction tuning teach task understanding or output format?

Models trained on semantically empty or deliberately incorrect instructions achieve comparable performance to those trained on full correct instructions, achieving 43% vs random baseline 42.6%. The semantic content of instructions appears largely irrelevant; what transfers is knowledge of the output space.

Can editing hidden representations beat weight updates for finetuning?

ReFT learns task-specific interventions on frozen model representations rather than updating weights, with LoReFT (low-rank linear subspace variant) dramatically outperforming LoRA across reasoning, instruction-following, and NLU benchmarks while using far fewer parameters.

Can decoding-time tuning preserve knowledge better than weight fine-tuning?

Proxy-tuning closes 88-91% of the alignment gap while surpassing direct fine-tuning on knowledge tasks by leaving base model weights untouched. Direct fine-tuning corrupts knowledge storage in lower layers, whereas proxy-tuning applies distributional shifts that primarily affect reasoning and style.

Next inquiring lines