How should abstraction preserve applicability conditions when distilling experience?
This explores the central tension in turning past experience into reusable knowledge: abstraction works by stripping away specifics, but strip too much and you lose track of *when* a lesson actually applies — so how do agents generalize without forgetting the conditions that made the lesson true.
This explores the central tension in turning past experience into reusable knowledge: abstraction works by throwing away specifics, but the conditions under which a lesson holds are themselves a kind of specific. Distill too aggressively and you keep the rule while losing the 'only when…' attached to it. The corpus circles this problem from several angles, and the most useful thread is that good abstraction is *selective*, not *maximal* — it discards example-specific values while deliberately retaining the structural context that signals applicability. Agent Workflow Memory is the cleanest illustration: it abstracts away example-specific values (this URL, that button) but preserves the sub-task routine as a unit, so the routine carries its own 'this is the shape of situation I belong to.' The gains grow precisely as train-test gaps widen Can agents learn reusable sub-task routines from past experience?, which is the signature of an abstraction that generalized without over-generalizing.
The failure mode at the other end is worth naming, because it's what 'preserving applicability conditions' is defending against. Chain-of-thought, on one reading, is abstraction gone wrong: it reproduces the *form* of a reasoning pattern learned in training while losing the conditions under which that pattern is valid — so performance collapses predictably under distribution shift Does chain-of-thought reasoning reveal genuine inference or pattern matching?. That's the diagnostic. An abstraction that has shed its applicability conditions looks fine until the situation drifts, then breaks silently. The same shape shows up in memory work as 'brevity bias' and 'context collapse': when you compress a playbook by rewriting it wholesale, you erase the hard-won detail that told you when each move was right. ACE's answer is to grow contexts through incremental generation-reflection-curation rather than full rewrites, treating the playbook as something you *append conditions to* rather than *summarize away* Can context playbooks prevent knowledge loss during iteration?.
There's a structural counterpoint that reframes the whole question. LLM Programs deliberately hide step-irrelevant context, showing each call only what it needs Can algorithms control LLM reasoning better than LLMs alone?. That sounds like the opposite of preserving conditions — but it isn't. The applicability condition there lives *outside* the abstraction, encoded in the surrounding algorithm's control flow rather than inside the distilled step. This is a genuine design fork: do you bake the 'when' into the abstraction itself (AWM's self-describing routines), or do you keep abstractions context-free and let an external scaffold decide when to invoke them? DeepAgent's memory folding splits the difference, consolidating history into typed schemas — episodic, working, tool — where the schema type itself is a coarse applicability tag Can agents compress their own memory without losing critical details?.
The deeper reason this matters connects to a separate finding the reader might not expect to be relevant: RL post-training seems to teach models *when* to deploy reasoning, not *how* — the capability pre-exists, and what's learned is the routing Does RL post-training create reasoning or just deploy it?. If that's right, then 'applicability conditions' aren't a side-constraint on distilled experience — they're the *main thing being learned*. The abstraction (the reasoning move) was already latent; the valuable distillate is the trigger. AgentFly makes this operational, doing all of its continual learning through memory operations that handle credit assignment — i.e. learning which stored case applies to which new situation — without touching model weights Can agents learn continuously from experience without updating weights?.
The takeaway a curious reader can leave with: 'preserve applicability conditions' is not a footnote to abstraction — it may be the harder and more valuable half. An abstraction without its conditions is just a pattern waiting to misfire on the next distribution shift. The corpus suggests three viable disciplines — keep conditions inside the unit (self-describing routines), keep them in an external scaffold (program control flow), or keep them in a typed memory index (folded schemas) — but all three agree on the negative: never compress the 'when' away to make the 'what' shorter.
Sources 7 notes
Agent Workflow Memory induces sub-task routines at finer granularity than full tasks, abstracts example-specific values, and compounds them hierarchically. This produces 24.6% relative gain on Mind2Web and 51.1% on WebArena, with larger gains as train-test gaps widen.
CoT works by constraining models to reproduce familiar reasoning patterns from training, not by enabling novel symbolic reasoning. Performance degrades predictably under distribution shifts—the signature of imitation rather than capability emergence.
The ACE framework treats contexts as evolving playbooks using generation-reflection-curation loops rather than full rewrites. This prevents knowledge loss from compression and detail erosion, achieving +10.6% on agentic tasks and +8.6% on finance without labeled supervision.
LLM Programs embed LLMs within explicit algorithms that manage control flow and state, presenting only step-specific context to each LLM call. This information hiding addresses capability and context window limits while treating complex reasoning as modular, debuggable sub-tasks.
DeepAgent's autonomous memory folding consolidates interaction history into episodic, working, and tool memory schemas. This reduces token overhead while letting agents pause to reconsider strategies—the autonomy and structure together avoid degradation that plagues poorly designed consolidation.
Evidence shows base models already contain reasoning capability in latent form; RL training optimizes deployment timing rather than capability creation. Hybrid models recover 91% of performance gains by routing tokens only, and activation vectors for reasoning strategies pre-exist before any RL.
AgentFly formalizes agent learning as a Memory-augmented MDP with three memory modules (case, subtask, tool) that enable credit assignment and policy improvement entirely through memory operations. The approach achieved 87.88% on GAIA validation without modifying LLM parameters.