Can skills work better as weights than as prompts?
Most agent systems store skills as text in prompts, but this inflates token costs and degrades model performance. Could compiling skills into trainable weight-space adapters instead offer a better trade-off between efficiency and capability?
Most agent-skill systems retrieve a relevant textual procedure and paste it into the prompt at each decision step. It is modular and simple, but it scales badly: the same skill text gets re-inserted across steps, inflating prefill cost, and long inputs degrade the model's ability to use all of it. LatentSkill's bet is that the substrate for skills is wrong. It uses a pretrained hypernetwork to compile a textual skill into a plug-and-play LoRA adapter, storing the procedure in weight space rather than context space — removing per-step skill tokens (64% fewer prefill tokens on ALFWorld, 72% lower skill-token overhead on Search-QA) while still beating the in-context baseline on success.
What makes this more than a compression trick is that the weight-space skills retain the properties that made textual skills useful. The generated LoRAs form a structured semantic geometry, can be dialed up or down via the scaling coefficient, and — when components are aligned — can be composed through parameter-space arithmetic. Skills become objects you add and weight, not strings you concatenate. A secondary benefit is that the skill is no longer exposed as plaintext in the prompt, which has obvious implications for skill-IP and prompt-injection surfaces.
This sits beside Can models dynamically activate expert skills at inference time?: both reject context-space capability injection in favor of composable weight-space experts, though SVF composes existing expert vectors at test time while LatentSkill generates an adapter from a text skill on demand. It also reframes the harness/skill lifecycle — where Can skill documents be optimized like neural network weights? keeps the skill as trainable text, LatentSkill argues the deployed form should be weights. The open risk is fidelity: hypernetwork-generated adapters may capture less nuance than the source text, and parameter-space composition only works when components happen to be aligned. Inspectability also drops — a LoRA is far harder to audit than a skill.md, which cuts directly against the COLLEAGUE.SKILL governance argument.
Inquiring lines that use this note as a source 3
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
Related concepts in this collection 3
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Can models dynamically activate expert skills at inference time?
Can language models efficiently discover and compose task-specific capabilities on the fly without modifying base weights? This explores whether test-time adaptation through expert vector composition outperforms fixed fine-tuning approaches.
convergent-with: both compose capability in weight space rather than context, here generating adapters from text skills on demand
-
Can skill documents be optimized like neural network weights?
Can natural-language skill documents be treated as trainable parameters and improved through iterative optimization with validation gating, similar to how model weights are tuned in deep learning?
contrasts: keeps the skill as trainable text where LatentSkill argues the deployed form should be weights
-
Can agents learn new skills without forgetting old ones?
Explores whether externalized skill libraries—storing learned behaviors as retrievable code rather than parameter updates—can solve the catastrophic forgetting problem that plagues continual learning systems.
extends the composable-skill-library idea into parameter space via LoRA arithmetic
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- LatentSkill: From In-Context Textual Skills to In-Weight Latent Skills for LLM Agents
- SkillOpt: Executive Strategy for Self-Evolving Agent Skills
- MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation
- Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering
- Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents
- Recommender AI Agent: Integrating Large Language Models for Interactive Recommendations
- Agent S: An Open Agentic Framework that Uses Computers Like a Human
- MetaClaw: Just Talk — An Agent That Meta-Learns and Evolves in the Wild
Original note title
moving agent skills from context space to weight space trades plaintext prompt overhead for composable LoRA adapters — skills become parameters you can scale and add