Does state-indexed memory outperform high-level workflow memory for web agents?
Should procedural memory for web agents be organized around specific environment states and actions, or abstracted into higher-level workflows? This matters because web automation demands precise, context-sensitive recall that workflows might lose.
PRAXIS distinguishes two kinds of agent knowledge — facts (atomic, context-independent at any moment) and procedures (state-dependent sequences over actions) — and argues procedures are at least as important as facts for real-world deployment yet remain underexplored compared to factual memory frameworks like Mem0 and Letta.
The standard alternative — a-priori procedural specification, where humans write SOPs included in the agent's context — fails for three structural reasons. Many procedures are not fully documented because humans learn by observation rather than reading SOPs. Enumerating all states and edge cases in a combinatorial space is intractable. And procedures become obsolete quickly as environments change. The brittleness intensifies as AI design tools generate novel interfaces that push agents into out-of-distribution states.
PRAXIS's response is a-posteriori learning of procedures from demonstrations or experience, indexed by environment state. The key differentiation from Agent Workflow Memory (see Can agents learn reusable sub-task routines from past experience?), Synapse, and ExpeL — which abstract workflows from successful trajectories at the high-level natural-language workflow tier — is that PRAXIS performs local state-based recall grounded primarily in the live environment state and secondarily to the goal. Memories are indexed with explicit state and action descriptors rather than high-level trajectory summaries, enabling precise recall of minute details that web environments require.
This is a direct architectural disagreement with Can frozen language models continually improve through memory structure alone? (CLIN: causal-rule memory transfers best) and AWM (workflow-routine memory compounds best). All three target the question "what shape should agent memory take?" and pick different answers — causal rules, abstracted workflows, or local state-action pairs — with PRAXIS arguing the first two abstract too far from the specifics web automation demands.
Empirically, integrating state-dependent memory into the Altrina web agent yields consistent improvements on the REAL benchmark across diverse VLM backbones: higher average accuracy, higher best-of-5, better reliability, fewer steps to completion. An ablation shows gains increase with retrieval breadth k. The structural claim is that reusable local state-to-action priors are what guide robust generalizable behavior — not abstracted workflows that transfer the gist but lose the click-by-click specifics web automation demands.
This note is tagged type: tension because the disagreement with AWM and CLIN is real and unresolved — see ops/tensions/agent memory granularity tension across AWM CLIN and PRAXIS for the cross-paper tension capture.
Inquiring lines that use this note as a source 19
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- How does spatial density in web UIs break workflow-level memory?
- How do the six memory components combine across explicit and implicit paths?
- Why does GUI agent memory need different abstraction levels?
- How should headers index procedural intent differently from keyword chunking?
- Can state-indexed memory retrieval breadth predict gains in web agent robustness?
- What makes a memory reachable in the right context?
- How does procedural memory granularity affect web agent performance?
- How does workflow abstraction compare to state-indexed procedural memory for web agents?
- What is the right granularity level for agent memory to enable both reuse and composition?
- Does workflow-level memory or state-action memory better capture reusable agent knowledge?
- How do strategy-level abstractions differ from storing raw task workflows?
- How do memory-resident safeguards get surfaced at the exact decision point where they matter?
- Can memory workspaces resolve contradictory evidence that stateless systems miss?
- When should architects prioritize consolidation compute over larger context windows?
- What makes memory curation harder to solve than simply expanding storage?
- How should memory systems split between short-term and long-term storage?
- Can the same compress-then-act pattern work for agent state memory?
- Can externalizing bookkeeping to a stateful harness replace internalized memory control?
- What specific bookkeeping tasks can environments maintain more reliably than policies?
Related concepts in this collection 5
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Can agents learn reusable sub-task routines from past experience?
Do web agents fail at long-horizon tasks because they cannot extract and reuse workflows shared across similar problems? This explores whether sub-task abstraction enables skill accumulation rather than task-by-task problem solving.
contradicts: PRAXIS argues local state-action recall beats workflow abstraction; AWM argues abstracted sub-task workflows compound and transfer best. Direct architectural disagreement on memory granularity.
-
Can frozen language models continually improve through memory structure alone?
If agents can't update parameters, what form of textual memory lets them keep learning across trials and transfer to new tasks without retraining?
contradicts: CLIN advocates causal-rule abstractions; PRAXIS argues abstractions of any kind lose the specifics web environments need. Three-way memory-shape tension when paired with AWM.
-
How can GUI agents adapt when software constantly changes?
Can desktop automation agents stay current by combining real-time web documentation with learned task patterns and concrete execution memories? This explores how to avoid training obsolescence in open-world software environments.
partial agreement: Agent S's episodic memory is closer to PRAXIS than its narrative memory; PRAXIS would predict Agent S's gains come from the episodic layer and the narrative layer is dispensable for web automation.
-
Can agents learn preferences by watching rather than asking?
Explores whether multimodal agents can build accurate preference models through continuous observation of user behavior, without explicit instruction, by organizing memory around entities and separating concrete events from derived knowledge.
complements: M3-Agent splits episodic vs semantic at the storage layer; PRAXIS focuses on the procedural memory dimension that M3-Agent leaves underspecified.
-
Why do LLM agents ignore condensed experience summaries?
LLM agents faithfully learn from raw experience but systematically disregard condensed summaries of the same experience. This study investigates whether the problem lies in how summaries are made, how models process them, or whether models simply don't need them.
supports: agents systematically ignore condensed experience, which would predict that high-level workflow abstractions degrade exactly the way PRAXIS observes — supporting evidence for state-dependent local memory over abstracted summaries.
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Real-Time Procedural Learning From Experience for AI Agents
- Why Do Multi-agent LLM Systems Fail?
- Agent Workflow Memory
- Agentic Code Reasoning
- LLMs Corrupt Your Documents When You Delegate
- Toward Efficient Agents: A Survey of Memory, Tool Learning, and Planning
- From Model Scaling to System Scaling: Scaling the Harness in Agentic AI
- Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning
Original note title
state-dependent procedural memory beats workflow-level memory for web agents — local state-action recall captures details that high-level trajectory abstractions lose