INQUIRING LINE

Inquiring lines›How does AI reshape human reasonin…›How do training data and procedure…›How should systems govern persiste…›this inquiring line

AI agents now write their own code, but nobody has a clear answer for what they should save versus discard.

How do agents decide which created code should persist versus disappear?

This explores how autonomous agents manage the lifecycle of code they write themselves — what gets kept, reused, and shared versus discarded — which the corpus frames as memory management, skill curation, and an underexplored 'persistence' problem.

This explores how agents decide which of their own generated code should survive across tasks versus be thrown away — and it turns out the corpus treats this less as a coding question and more as a *memory and curation* question. The starting point is that agent-authored code that persists and gets shared is, frankly, the least-understood layer of the whole agentic stack What makes agent-authored code worth persisting and sharing?. The reason it matters is that code isn't just an output to be regenerated on demand — it's an executable, inspectable, stateful medium the agent reasons through Can code serve as the operational substrate for agent reasoning?. Once you see code as a substrate rather than a deliverable, 'should this persist?' becomes a real decision with consequences.

The clearest mechanism for keeping code is the skill library. VOYAGER stores working code as named, reusable skills in an embedding-indexed library and composes complex skills out of simpler ones, which lets it keep learning without the catastrophic forgetting that weight-updating methods suffer Can agents learn new skills without forgetting old ones?. So the first answer to 'what persists' is: code that *worked* — validated by environmental feedback — gets promoted into the library; everything else is scratch. But who does the promoting matters. SkillOS shows that handing curation to a *separately trained curator* (decoupled from the frozen executor that writes the code) shifts the library away from verbose generic additions toward sharp, actionable execution logic and cross-task meta-strategies Can a separate trained curator improve skill libraries better than frozen agents?. In other words, the keep/discard decision improves dramatically when it's a learned skill in its own right, not a side effect of generation.

Laterally, this is the same problem the memory-management literature is wrestling with under different vocabulary. One framing splits the decision into two paths: an explicit 'hot path' where the agent itself decides via tool calls what to store or delete, and an implicit background path triggered programmatically — trading context-sensitivity against reliability How should agents decide what memories to keep?. DeepAgent pushes the autonomous side further with 'memory folding,' compressing past interactions into structured episodic/working/tool schemas so the agent keeps what's strategically useful and sheds token overhead Can agents compress their own memory without losing critical details?. RAISE adds a useful nuance: memory (and by extension artifacts) decomposes by time scale and granularity, which predicts that different kinds of code should follow different retention policies rather than one global rule How should agent memory split across time scales?.

The quietly surprising thread is *economic*. In a 115-day persistent-agent study, 82.9% of tokens were cache reads, which flips the accounting: when context and code persist and get reused, the meaningful cost unit stops being the token and becomes the completed artifact Do persistent agents really cost less per token?. That reframes the persist-versus-disappear decision entirely — keeping code isn't just a capability play, it's how the whole economics of long-running agents works. And persistence cuts the other way too: the same long-lived environment logged 889 governance events with safeguards baked directly into the memory layer the agent consults while deciding, so what persists isn't only skills but the rules constraining what's allowed to persist Can governance rules embedded in runtime memory actually protect autonomous agents?.

One thing the corpus is honest about: the deletion and lifecycle side is genuinely under-researched. We have good stories for *promotion* (skills, folding, curators) but the open challenges still cluster around persistence, sharing, and lifecycle — which is exactly where the next gains in autonomy and coordination are expected to come from What makes agent-authored code worth persisting and sharing?. If you want a doorway into the trade-off itself, start with the two-path memory split and then read SkillOS to see what changes when curation becomes a trained skill rather than an afterthought.

Sources 9 notes

What makes agent-authored code worth persisting and sharing?

Of three agentic code elements, agent-initiated artifacts that persist and are shared across agents remain underexplored. Open challenges cluster around lifecycle decisions, shared state consistency, and promotion from scratch work to durable infrastructure.

Can code serve as the operational substrate for agent reasoning?

Research shows code uniquely enables agent reasoning, action, and verification by being simultaneously executable, inspectable, and stateful. This unified code-centered loop improves reasoning and verification together compared to natural-language or prose-based approaches.

Can agents learn new skills without forgetting old ones?

VOYAGER demonstrates that storing executable skills in an embedding-indexed library and composing complex skills from simpler ones allows agents to learn continuously while avoiding the forgetting that occurs with weight-update-based methods. Environmental feedback refines skills while an automatic curriculum drives continual exploration.

Can a separate trained curator improve skill libraries better than frozen agents?

SkillOS shows that separating a trainable curator from a frozen executor, grouped by task streams, causes skill repositories to shift from generic verbose additions toward actionable execution logic and cross-task meta-strategies. The trained curator generalizes across different executor backbones and domains.

How should agents decide what memories to keep?

Memory management decomposes into explicit hot-path (agent decides via tool calling) and implicit background (programmatically triggered) paths. Each approach trades context-sensitivity for reliability differently across generation, storage, retrieval, and deletion.

Show all 9 sources

Can agents compress their own memory without losing critical details?

DeepAgent's autonomous memory folding consolidates interaction history into episodic, working, and tool memory schemas. This reduces token overhead while letting agents pause to reconsider strategies—the autonomy and structure together avoid degradation that plagues poorly designed consolidation.

How should agent memory split across time scales?

RAISE shows that agent memory consists of four components organized by two design axes: dialogue-level (conversation history, scratchpad) versus turn-level (examples, task trajectory). This granularity distinction predicts different failure modes and update policies for each component.

Do persistent agents really cost less per token?

A 115-day case study found 82.9% of tokens were cache reads. When context persists and reuses, the meaningful cost denominator becomes completed artifacts, not individual tokens.

Can governance rules embedded in runtime memory actually protect autonomous agents?

A persistent agent recorded 889 governance events across 96 active days, with safeguards encoded directly into the memory layer the agent consulted during operation. Runtime-resident governance proved more effective than external policies because the agent actually accessed it during decision-making.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

From Model Scaling to System Scaling: Scaling the Harness in Agentic AI4.05 match · arxiv ↗
Useful Memories Become Faulty When Continuously Updated by LLMs3.36 match · arxiv ↗
MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation3.34 match · arxiv ↗
GateMem: Benchmarking Memory Governance in Multi-Principal Shared-Memory Agents3.32 match · arxiv ↗
Are We Ready For An Agent-Native Memory System?2.53 match · arxiv ↗
Toward Efficient Agents: A Survey of Memory, Tool Learning, and Planning2.48 match · arxiv ↗
Agents of Chaos2.41 match · arxiv ↗
SkillOS: Learning Skill Curation for Self-Evolving Agents1.77 match · arxiv ↗

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst tracking agent memory, code persistence, and curation mechanisms. The core question: *what determines whether agent-generated code survives across tasks or gets discarded, and how does that decision reshape agent architecture?*

What a curated library found — and when (dated claims, not current truth):
Findings span 2025–05 through 2026–05. The library identified:
- Skill libraries (e.g., VOYAGER-style) persist code validated by environmental feedback; RL-trained curators (decoupled from executors) shift libraries from verbose to sharp, actionable logic (SkillOS, 2026–05).
- Agent memory splits into explicit 'hot path' (agent-initiated tool calls) and implicit background paths (programmatic triggers); autonomous memory folding compresses interactions into episodic/working/tool schemas (DeepAgent, 2026–10).
- Memory decomposes by time scale and granularity, predicting different retention policies for different code types (RAISE framework, circa 2026).
- In 115-day persistent-agent study: 82.9% of tokens were cache reads; the economic unit shifted from cost-per-token to cost-per-completed-artifact (2026–05).
- Governance rules embedded in the memory layer the agent consults constrain what persists; deletion and lifecycle management remain under-researched.

Anchor papers (verify; mind their dates):
- arXiv:2606.06614 (SkillOS: Learning Skill Curation for Self-Evolving Agents, 2026–05)
- arXiv:2610.21618 (DeepAgent: A General Reasoning Agent with Scalable Toolsets, 2026–10)
- arXiv:2605.26870 (Persistent AI Agents in Academic Research: A Single-Investigator Implementation Case Study, 2026–05)
- arXiv:2604.08377 (SkillClaw: Let Skills Evolve Collectively with Agentic Evolver, 2026–04)

Your task:
(1) **RE-TEST EACH CONSTRAINT.** For each finding above — skill promotion, curator decoupling, dual-path memory, memory folding, time-scale decomposition, cache economics, embedded governance — assess whether newer model scaling, improved training regimes for curators, multi-agent orchestration (where code artifacts move between agents), or formal verification tools have since relaxed or overturned the limitation. Separate the durable question (how do agents *reason about persistence*?) from perishable claims (e.g., 'curators must be separately trained' — can end-to-end training now do it?). Cite what resolved it.
(2) **Surface the strongest contradicting or superseding work from the last ~6 months.** Does any recent paper argue that persistence is *harmful* (e.g., code drift, stale assumptions) or that stateless re-generation is now cheaper/safer?
(3) **Propose 2 research questions that assume the regime may have moved:** e.g., 'If multi-agent code-sharing becomes standard, how do agents vet code written by peers?' or 'Can a single learned pruning policy replace hand-tuned time-scale decomposition?'

Cite arXiv IDs; flag anything you cannot ground in a real paper.

AI agents now write their own code, but nobody has a clear answer for what they should save versus discard.

Related lines of inquiry

Sources 9 notes

Papers this line draws on 8