TOPIC

LLM Memory

26 synthesis notes · 56 source papers
View as

Can LLMs read long documents like humans do?

How might mimicking human reading strategies—storing gist memories and looking up details on demand—help language models handle documents beyond their effective context window?

Explore related Read →

Is agent memory a storage problem or a connectivity problem?

Most systems treat memory as a repository to store and retrieve. But what if memory's real usefulness depends on how units are linked together rather than what is stored?

Explore related Read →

How should agents decide what memories to keep?

Agent memory management splits between agents autonomously recognizing important information versus programmatic triggers. Understanding this choice reveals why different memory architectures prioritize different information types.

Explore related Read →

Should agent memory adapt dynamically based on execution feedback?

Can agents improve performance by continuously reshaping memory connections in response to whether tasks succeed or fail, rather than relying on fixed retrieval pipelines? This matters because static memory degrades in changing environments.

Explore related Read →

Can three axes replace the short-term long-term memory split?

Does breaking agent memory into forms, functions, and dynamics provide a clearer framework than the traditional short-term/long-term distinction? This matters because current agent-memory literature lacks a unified vocabulary, making comparison between systems nearly impossible.

Explore related Read →

How should agent memory split across time scales?

Explores whether agent working memory should be organized by temporal scope—some components persisting across a conversation, others refreshed each turn. Understanding this distinction could reveal why some memory designs fail.

Explore related Read →

Can retrieval knowledge compress into a tiny parametric model?

Can the information stored in large non-parametric retrieval datastores be compressed into a small trainable module? This matters because it could combine retrieval's knowledge benefits with the speed of pure parametric methods.

Explore related Read →

Can a single model replace retrieval for long-term conversation memory?

COMEDY proposes collapsing the standard retrieval pipeline into one unified model that generates, compresses, and responds. But does eliminating the retriever actually improve performance, or does compression lose critical information?

Explore related Read →

Can lookup memory and computation work together better than either alone?

Mixture-of-Experts handles dynamic logic, but static knowledge might need a different mechanism. Can a hybrid approach combining conditional computation with fast lookup outperform pure sparse models?

Explore related Read →

Can models consolidate memories during offline sleep phases?

This explores whether LLMs can use dedicated offline periods to consolidate short-term learning into permanent weights, avoiding catastrophic forgetting and the need for expensive retraining.

Explore related Read →

Does agent memory degrade when continuously consolidated?

Can consolidating agent experiences into summaries actually harm long-term performance? Research on ARC-AGI tasks suggests continuous memory updates may reduce capability below the no-memory baseline.

Explore related Read →

Why do time-based queries fail in conversational retrieval systems?

Conversational memory systems struggle with questions that reference when something was discussed rather than what was said. Standard vector databases lack temporal indexing to retrieve by metadata like date, speaker, or session order.

Explore related Read →

Can agents learn better from their failures than successes?

Does storing reasoning strategies extracted from both successful and failed experiences improve agent learning compared to tracking only successes or raw trajectories? This matters because failures offer preventative lessons that successes alone cannot teach.

Explore related Read →

Can brain memory systems explain how LLMs should store knowledge?

This explores whether the brain's three-tier memory architecture—neocortex, hippocampus, and prefrontal cortex—maps onto transformer weights, external knowledge stores, and agentic state. Understanding this mapping could reveal which AI memory problems each tier solves and which it cannot.

Explore related Read →

When do language models stop memorizing and start generalizing?

Can we measure the exact capacity limit where models transition from memorizing training data to learning underlying patterns? Understanding this boundary could reshape how we think about model learning and privacy.

Explore related Read →

Has memory architecture replaced parameter count as the scaling frontier?

Late-2025 research suggests the field's next major efficiency gains come from restructuring how models store and use experience rather than simply making them larger. Three convergent signals point to this shift.

Explore related Read →

Can agents learn continuously from experience without updating weights?

This explores whether LLM agents can adapt to new tasks and failures by retrieving past experiences from memory alone, rather than requiring expensive parameter fine-tuning or rigid hardcoded rules.

Explore related Read →

Can agents fail from weak memory control rather than missing knowledge?

As multi-turn agent workflows grow longer, performance degrades—but is this due to insufficient context or poor memory management? This explores whether memory *control* is the real bottleneck.

Explore related Read →

Can agents learn preferences by watching rather than asking?

Explores whether multimodal agents can build accurate preference models through continuous observation of user behavior, without explicit instruction, by organizing memory around entities and separating concrete events from derived knowledge.

Explore related Read →

Where does a model store memorized paragraphs?

Can we pinpoint the specific layers, attention heads, and tokens where language models localize verbatim memorization? Understanding this spatial signature could enable targeted unlearning.

Explore related Read →

Can storing evolved thoughts prevent inconsistent reasoning in conversations?

When LLMs repeatedly reason over the same conversation history for different questions, they produce inconsistent results. Can storing pre-reasoned thoughts instead of raw history solve this problem?

Explore related Read →

Can recursive subtask trees overcome context window limits?

Explores whether modeling reasoning as prunable trees of subtasks could eliminate the context length constraints that currently force developers into multi-agent architectures. Asks if working memory can become truly unlimited through selective KV cache retention.

Explore related Read →

Do RL agents accidentally use environments as memory?

Explores whether reinforcement learning agents unintentionally create external memory through environmental artifacts—like trails and marks—without being explicitly trained to do so, and whether this constitutes genuine cognitive extension.

Explore related Read →

Can reasoning systems maintain memory across retrieval cycles?

Existing retrieval systems treat each lookup independently. But what if reasoning required a persistent memory workspace that evolves as contradictions emerge and understanding deepens?

Explore related Read →

Can we detect memorable moments by observing emotional expressions?

Emotion recognition systems assume that detecting emotional moments will identify what people remember. But does observed emotion in group settings actually predict individual memorability, or does the proxy fail?

Explore related Read →

Where do memorization errors arise in chain-of-thought reasoning?

Explores whether memorization in language model reasoning can be localized to specific token sources and which sources dominate error patterns during long generations.

Explore related Read →

Source papers 56

The Arxiv papers behind this sub-topic. Links may take you off-site to arxiv.org.