LatentSkill: From In-Context Textual Skills to In-Weight Latent Skills for LLM Agents
Agent systems increasingly use textual skills to encode reusable task procedures, but injecting these skills into the prompt at every step incurs substantial context overhead and exposes skill content as plaintext. We present LatentSkill, a framework that converts textual skills into plug-and-play LoRA adapters through a pretrained hypernetwork. LatentSkill stores skill knowledge in weight space rather than context space, removing per-step skill tokens while preserving modular loading, scaling, and composition. On ALFWorld and Search-QA, LatentSkill outperforms the corresponding in-context skill baseline while using substantially fewer prefill tokens: it improves ALF- World success by 21.4 and 13.4 points on the seen and unseen splits with 64.1% fewer prefill tokens, and improves Search-QA exact match by 3.0 points with 72.2% lower skill-token overhead. Further analysis shows that generated skill LoRAs form a structured semantic geometry, can be precisely controlled via the LoRA scaling coefficient, and can be composed through parameter-space arithmetic when skill components are aligned. These findings suggest that weight-space skills provide an efficient, modular, and less exposed substrate for extending LLM agents.1
Introduction. LLM agents increasingly solve complex tasks by interleaving reasoning, action, and feedback from external environments (Yao et al., 2023; Shinn et al., 2023; Zhao et al., 2024). To handle specialized and long-horizon tasks, many systems further rely on external skills: reusable textual procedures that encode task strategies, tool-use patterns, and recovery heuristics (Wang et al., 2023; Xia et al., 2026; Wu et al., 2026; Ouyang et al., 2026; Pan et al., 2026; Wang et al., 2026). A common design retrieves relevant skills from a skill library and inserts them into the prompt when the agent selects an action (Cho et al., 2026; Zhang et al., 2026). This design is simple and modular, but it becomes costly as interactions grow longer and skill libraries grow larger. The same skill text may be inserted repeatedly across decision steps, consuming context and increasing prefill cost; long inputs also make it harder for models to use all supplied information robustly (Jiang et al., 2023, 2024; Liu et al., 2024).
Discussion / Conclusion. LatentSkill converts textual agent skills into modular LoRA adapters through a pretrained hypernetwork, moving reusable procedural knowledge from context space into weight space. Across ALFWorld and Search-QA, this design improves over direct in-context skill prompting while substantially reducing the repeated prefill overhead introduced by skill text. Beyond efficiency, our analyses show that the generated skill LoRAs form a structured semantic geometry, can be controlled through the injection coefficient, and can be composed in parameter space when skill components are properly aligned. These results suggest that latent skill weights offer a practical substrate for building LLM agents whose skills are efficient, modular, controllable, and less directly exposed as plaintext prompts. This work evaluates LatentSkill on two agent benchmarks, ALFWorld and Search-QA, which cover embodied interaction and search-augmented question answering.