What makes agent-created code artifacts so hard to manage?
Agent-authored code that persists and is shared across systems raises difficult questions about what should be kept versus discarded, and how to maintain consistent state when multiple agents collaborate on the same artifacts.
Among the three elements of agentic code — model capability, harness infrastructure, and agent-initiated artifacts — the survey flags the third as the one that "remains relatively underexplored." Agent-initiated code artifacts are the interactive objects an agent creates, executes, observes, revises, persists, and shares during a task: patches and tests authored over a live repository, interface commands synthesized against DOM trees, hypothesis-testing pipelines composed on the fly, executable policies and skill libraries revised in response to environment feedback. These appear across coding assistance, GUI/OS automation, scientific discovery, and embodied control — yet they sit outside the well-mapped territory of predefined infrastructure.
The open questions cluster around persistence and sharing. When an agent writes code that outlives the current step, what should persist and what should be discarded? When multiple agents share artifacts, how is consistent state maintained, and how is a useful artifact promoted from one-off scratch work to durable, reviewable infrastructure? The survey's listed open challenges — evaluation beyond final task success, verification under incomplete feedback, regression-free harness improvement, consistent shared state across agents, human oversight for safety-critical actions — converge on exactly this layer. The counterpoint is that some agent-authored code is genuinely disposable and over-engineering its lifecycle wastes effort. But this matters because the artifacts an agent creates may be where the next gains in autonomy and coordination live, and they are precisely what current harness engineering least understands.
Inquiring lines that use this note as a source 6
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- How do standardized artifacts improve coordination between writing agents?
- When should agent-created code be promoted into permanent harness infrastructure?
- What prevents multiple agents from corrupting shared state in live artifacts?
- How do agents decide which created code should persist versus disappear?
- How should human oversight apply to persistent agent-authored code?
- What makes persistent, shared code artifacts from agents hard to manage at scale?
Related concepts in this collection 2
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Can agents learn reusable sub-task routines from past experience?
Do web agents fail at long-horizon tasks because they cannot extract and reuse workflows shared across similar problems? This explores whether sub-task abstraction enables skill accumulation rather than task-by-task problem solving.
a concrete case of persistent agent-authored artifacts (reusable routines) compounding over time
-
Can agents adapt without pausing service to users?
Can deployed LLM agents continuously improve their capabilities while serving users without interruption? This explores whether fast behavioral updates and slow policy learning can coexist across different timescales.
addresses how agent-created skills should persist and be promoted, the lifecycle this note raises
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Code as Agent Harness
- Agents of Chaos
- Agentic Code Reasoning
- From Model Scaling to System Scaling: Scaling the Harness in Agentic AI
- A Practical Guide for Designing, Developing, and Deploying Production-Grade Agentic AI Workflows
- Why Do Multi-agent LLM Systems Fail?
- Towards a Science of Scaling Agent Systems
- How we built our multi-agent research system
Original note title
agent-initiated code artifacts that persist and are shared are the underexplored frontier of harness engineering