SYNTHESIS NOTE
Agentic Systems and Tool Use Model Architecture and Internals

Can agents learn cooperation by adapting to diverse partners?

Explores whether sequence model agents can develop mutual cooperation strategies through in-context learning when trained against varied co-players, without explicit cooperation mechanisms or hardcoded assumptions.

Synthesis note · 2026-02-23 · sourced from Agents Multi Architecture

Achieving cooperation among self-interested agents is a fundamental challenge in multi-agent reinforcement learning. Existing approaches that achieve mutual cooperation between "learning-aware" agents typically rely on hardcoded assumptions about co-player learning rules or enforce strict separation between fast-timescale "naive learners" and slow-timescale "meta-learners." Both constraints limit scalability.

This paper shows that in-context learning capabilities of sequence models provide a cleaner path. Training sequence model agents against a diverse distribution of co-players naturally induces in-context best-response strategies that effectively function as learning algorithms on the fast intra-episode timescale. No hardcoded assumptions about the opponent. No explicit timescale separation.

The cooperation mechanism is elegant: in-context adaptation renders agents vulnerable to extortion (because they adapt to exploitative strategies). This vulnerability creates mutual pressure between agents — each agent's in-context learning dynamics can be shaped by the other. The resulting mutual shaping pressure resolves into cooperative behavior.

Three components are necessary and sufficient: (1) sequence model agents with in-context learning capacity, (2) diverse co-player distribution during training, and (3) decentralized reinforcement learning. Co-player diversity is the key ingredient — it forces the agent to develop general in-context adaptation rather than memorizing responses to specific opponents.

Since Can transformers learn to solve new problems within episodes?, this finding extends ICRL from single-agent environments to multi-agent cooperation. The in-context learning mechanism that enables environment adaptation also enables co-player adaptation — and the social dynamics of mutual adaptation produce emergent cooperation.

The connection to Can cooperative bots escape frozen selfish populations? is structural: random exploration breaks frozen equilibria in population games; diverse co-player training breaks the equilibrium of mutual defection in dyadic games. Both work through diversity of experience rather than explicit cooperation incentives.

Inquiring lines that use this note as a source 34

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
12 direct connections · 124 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

in-context co-player modeling enables cooperation without hardcoded assumptions — training against diverse co-players induces mutual shaping through vulnerability to extortion