INQUIRING LINE

Do learned workflows transfer between different agents with minimal accuracy loss?

This explores whether a skill or workflow one agent learns can be handed to a *different* agent and still work — and the corpus reframes the question around *where* the learning is stored, not just how well it copies.


This reads the question as: when an agent learns a useful routine, does that routine survive being moved to another agent (a different model, backbone, or user's system) without falling apart? The corpus has a sharp answer hiding inside a reframing — transfer works well precisely *because* the most successful systems don't store learned workflows in model weights at all. They externalize them as text or executable artifacts, which makes them portable by construction.

The most direct evidence is SkillOS Can a separate trained curator improve skill libraries better than frozen agents?, which separates a *trainable curator* (that evolves the skill library) from a *frozen executor* (that runs the skills). Because the curator's output is a repository of execution logic rather than a fine-tuned set of weights, that trained curator was shown to generalize across *different executor backbones and domains* — i.e., the workflows it produces aren't welded to the agent that helped create them. That's transfer with minimal loss, achieved by design rather than luck.

The same externalization logic shows up under different names. VOYAGER stores skills in an embedding-indexed library and composes complex behaviors from simpler ones Can agents learn new skills without forgetting old ones?; AgentFly does continual adaptation entirely through memory operations *without touching model parameters* Can agents learn continuously from experience without updating weights?; and Agent Workflow Memory abstracts away example-specific values to induce reusable sub-task routines — notably with *larger* gains as the gap between training and test situations widens Can agents learn reusable sub-task routines from past experience?. That last detail is the quiet surprise: a well-abstracted workflow can transfer *better* the more the new context differs, because abstraction is what strips out the parts that wouldn't have carried over. Transfer across *people* is the explicit goal of SkillClaw, which aggregates interaction trajectories from many users and synchronizes refined skills back system-wide How can agent systems share learned skills across users?.

But the corpus also marks the ceiling. Workflows that come from static expert demonstrations stay bounded by whatever the curator imagined — the agent never interacts with its environment, so it can't repair a routine that doesn't fit a new agent's situation Can agents learn beyond what their training data shows?. And transfer isn't free of side effects: in multi-agent settings, *where* a workflow sits matters, because high-influence positions amplify whatever signal flows through them — including malicious or sycophantic ones How does workflow position shape attack propagation in multi-agent systems?. So a transplanted workflow can carry transplanted vulnerabilities.

The thing you may not have known you wanted to know: the field's answer to 'do workflows transfer?' is really an argument about *substrate*. Store learning in weights and it's stuck to one agent; store it as an externalized, abstracted, embedding-indexed artifact and it becomes a portable object that a separate curator can even keep improving on someone else's behalf. If you want to go deeper, SkillOS and Agent Workflow Memory are the two doorways — one for cross-backbone transfer, one for why abstraction is what makes a routine survive the move.


Sources 7 notes

Can a separate trained curator improve skill libraries better than frozen agents?

SkillOS shows that separating a trainable curator from a frozen executor, grouped by task streams, causes skill repositories to shift from generic verbose additions toward actionable execution logic and cross-task meta-strategies. The trained curator generalizes across different executor backbones and domains.

Can agents learn new skills without forgetting old ones?

VOYAGER demonstrates that storing executable skills in an embedding-indexed library and composing complex skills from simpler ones allows agents to learn continuously while avoiding the forgetting that occurs with weight-update-based methods. Environmental feedback refines skills while an automatic curriculum drives continual exploration.

Can agents learn continuously from experience without updating weights?

AgentFly formalizes agent learning as a Memory-augmented MDP with three memory modules (case, subtask, tool) that enable credit assignment and policy improvement entirely through memory operations. The approach achieved 87.88% on GAIA validation without modifying LLM parameters.

Can agents learn reusable sub-task routines from past experience?

Agent Workflow Memory induces sub-task routines at finer granularity than full tasks, abstracts example-specific values, and compounds them hierarchically. This produces 24.6% relative gain on Mind2Web and 51.1% on WebArena, with larger gains as train-test gaps widen.

How can agent systems share learned skills across users?

SkillClaw aggregates interaction trajectories across users, processes them through an autonomous evolver that identifies patterns and refines skills, then synchronizes updates system-wide. This converts siloed individual learning into shared capability improvement without manual curation.

Can agents learn beyond what their training data shows?

Agents trained on static expert datasets cannot learn from their own failures or generalize beyond demonstrated scenarios because they never interact with environments during training. Competence is capped by what curators imagined, not by agent capacity.

How does workflow position shape attack propagation in multi-agent systems?

FLOWSTEER demonstrates that malicious signals propagate farther when injected into high-influence subtasks, and that framing them as evidence rather than instruction causes downstream agents to relay them. Influence concentrates where dependencies converge, making position-aware attacks far more effective.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

As an AI research analyst, test whether learned workflows transfer between different agents with minimal accuracy loss—treating this as still-open despite what older findings claimed.

What a curated library found—and when (dated claims, not current truth):

Findings span 2024–2026. A library of ~12 papers reveals a consensus that *externalizes* the problem:

• SkillOS (2026-05) decouples a trainable curator from a frozen executor; trained curators generalize across different executor backbones and domains because workflows are stored as executable artifacts, not weights.
• Agent Workflow Memory (2024-09) abstracts example-specific values into reusable sub-task routines; transfer *improves* as the gap between training and test contexts widens—abstraction strips non-portable elements.
• VOYAGER (implied in path) uses embedding-indexed skill libraries and compositional synthesis; AgentFly (implied) adapts entirely via memory operations without touching model parameters.
• SkillClaw (2026-04) aggregates interaction trajectories across users and synchronizes refined skills system-wide, targeting cross-user portability.
• Static expert demonstrations lock agents into curator imagination (2026 findings); multi-agent settings show workflow position amplifies malicious or sycophantic signals (FLOWSTEER, 2026-05), meaning transplanted workflows carry transplanted vulnerabilities.

Anchor papers (verify; mind their dates):
• SkillOS (arXiv:2605.06614, 2026-05)
• Agent Workflow Memory (arXiv:2409.07429, 2024-09)
• SkillClaw (arXiv:2604.08377, 2026-04)
• FLOWSTEER (arXiv:2605.11514, 2026-05)

Your task:
(1) RE-TEST EACH CONSTRAINT: For externalization-based transfer (SkillOS, VOYAGER), probe whether newer model scaling, in-context learning, or parameter-efficient tuning (LoRA, adapters) have since *reduced* the need for decoupling, or conversely, whether weight-based transfer now works better than the 2024–2026 papers claimed. Separately, test whether the vulnerability amplification finding (FLOWSTEER) still holds in post-2026 multi-agent orchestration with isolation layers or sandboxing.
(2) Surface the strongest *contradicting* work from the last ~6 months: has any recent paper shown that tightly coupled, weight-stored workflows outperform externalized ones, or that abstraction *hurts* transfer in specific regimes?
(3) Propose 2 research questions that assume the regime may have shifted: (a) Do modern LLMs with larger context windows and retrieval-augmented execution reduce or eliminate the advantage of pre-curated skill libraries? (b) In federated or edge-deployed agent networks, does workflow portability depend more on standardized serialization than on the internal representation (weights vs. artifacts)?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines