INQUIRING LINE

Model Architecture and Internals · Training, RL, and Test-Time Scaling · Agentic Systems and Tool Usecross-cluster

Can in-context learning's advantage erode once interaction histories exceed the context window?

This explores whether in-context learning — adapting from examples in the prompt without weight updates — loses its edge once a running interaction grows longer than what the context window can hold, and what the corpus offers as the fallback.

This explores whether in-context learning (ICL) stops paying off once an interaction history outgrows the context window — and the corpus suggests the answer is yes, but the more interesting finding is *why*, and what takes over. ICL's whole advantage is that it adapts without touching weights: the model reads recent examples and generalizes on the fly. But that advantage is rented from the window. The clearest reframing comes from work arguing the long-context limit isn't really about memory capacity at all — it's about the *compute* needed to fold evicted context into the model's internal state, a kind of offline 'sleep' consolidation that improves with more passes Is long-context bottleneck really about memory or compute?. Read that way, exceeding the window doesn't just truncate history; it forces a phase change from cheap in-context reading to expensive consolidation.

There's also a subtler erosion that happens *before* you hit the token ceiling. ICL for sequential tasks depends on having coherent trajectories from the same setting sitting in the prompt — 'trajectory burstiness' — not scattered isolated examples Why do trajectories matter more than individual examples for in-context learning?. As a history grows long and gets compressed or windowed, exactly that structure is what gets shredded first, so the quality of in-context signal can degrade well ahead of the hard limit. And even with room to spare, in-context information loses to strong training priors: models override what's in their context with parametric associations, which textual prompting alone can't fix Why do language models ignore information in their context?. So 'just keep it in context' was never a clean guarantee even before overflow.

The corpus's answer to the overflow problem is to stop relying on the window and externalize learning into durable stores. Reflexion writes verbal self-diagnoses into episodic memory so an agent improves across episodes with no weight updates — and notably keeps those reflections *uncompressed* because squeezing them is what destroys their usability Can agents learn from failure without updating their weights?. VOYAGER goes further, storing executable skills in an embedding-indexed library and composing new ones from old, achieving lifelong learning without the catastrophic forgetting that weight updates cause Can agents learn new skills without forgetting old ones?. These keep the *spirit* of ICL — no fine-tuning — while escaping the window by retrieving only what's relevant now.

The sharper lesson is that not all history deserves equal space, which is how you make finite context go further. SkillRL processes successes and failures differently — successes as concrete demonstrations, failures as abstracted lessons — hitting state-of-the-art while using substantially less context than treating everything uniformly Should successful and failed episodes be processed differently?. The ACE framework treats context as an evolving playbook updated incrementally rather than rewritten, precisely to avoid the 'context collapse' and detail erosion that compression causes when histories get long Can context playbooks prevent knowledge loss during iteration?. Both are admissions that the naive 'dump the whole history in the prompt' strategy doesn't scale.

The thing you didn't know you wanted to know: the most efficient long-horizon agents may not fight the window at all — they offload memory into the *world*. A mathematical result shows RL agents spontaneously use spatial environments as external memory, leaving artifacts that reduce the information their context needs to carry, with no explicit memory objective Do RL agents accidentally use environments as memory?. So 'history exceeds the window' is less a wall than a signal: the advantage doesn't vanish, it migrates — out of the prompt and into episodic stores, skill libraries, consolidated weights, or even the environment itself.

Sources 8 notes

Is long-context bottleneck really about memory or compute?

Research shows the bottleneck is not memory capacity but the compute required to consolidate evicted context into fast weights during offline sleep phases. Performance improves with more consolidation passes, following a test-time scaling pattern on harder reasoning tasks.

Why do trajectories matter more than individual examples for in-context learning?

In-context learning for sequential decision-making requires full or partial trajectories from the same environment level, not just isolated examples. This structural property—trajectory burstiness—allows models to generalize across vastly different tasks without weight updates.

Why do language models ignore information in their context?

Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.

Can agents learn from failure without updating their weights?

Reflexion demonstrates that unambiguous environmental feedback (success/failure) enables agents to write useful self-diagnoses and improve across episodes without parameter updates. The binary signal prevents rationalization, and keeping reflections uncompressed preserves their usability.

Can agents learn new skills without forgetting old ones?

VOYAGER demonstrates that storing executable skills in an embedding-indexed library and composing complex skills from simpler ones allows agents to learn continuously while avoiding the forgetting that occurs with weight-update-based methods. Environmental feedback refines skills while an automatic curriculum drives continual exploration.

Should successful and failed episodes be processed differently?

SkillRL demonstrates that treating successful episodes as concrete demonstrations and failures as abstracted lessons achieves state-of-the-art performance on complex tasks while using substantially less context than uniform approaches. The asymmetry mirrors human expert reasoning and avoids the degradation seen in uniform consolidation methods.

Can context playbooks prevent knowledge loss during iteration?

The ACE framework treats contexts as evolving playbooks using generation-reflection-curation loops rather than full rewrites. This prevents knowledge loss from compression and detail erosion, achieving +10.6% on agentic tasks and +8.6% on finance without labeled supervision.

Do RL agents accidentally use environments as memory?

Mathematical proof shows that environmental artifacts reduce information needed to represent history in RL agents. Path-following agents naturally develop memory-like behavior through standard reward optimization, satisfying situated cognition criteria without explicit memory objectives.

Can in-context learning's advantage erode once interaction histories exceed the context window?

Sources 8 notes

Next inquiring lines