SYNTHESIS NOTE
Model Architecture and Internals Agentic Systems and Tool Use

Can agents share thoughts without converting them to text?

Can multi-agent systems exchange information through continuous hidden representations instead of language? This matters because text serialization loses information and slows inference.

Synthesis note · 2026-02-23 · sourced from Agents Multi Architecture

Text-based multi-agent systems force rich internal representations through a lossy bottleneck: language. Every inter-agent message requires decoding continuous thoughts into discrete tokens and re-encoding them on the receiving end. LatentMAS eliminates this bottleneck entirely by enabling pure latent collaboration — agents think and communicate in continuous representation space without ever decoding to text.

The framework integrates two mechanisms:

Intra-agent latent reasoning: Each agent generates thoughts as auto-regressive last-layer hidden embeddings — the model's ongoing internal representations without explicit decoding. This preserves the full information content of the model's reasoning at each step.

Cross-agent latent working memory: Information is exchanged via shared layer-wise KV caches that capture both the input context and newly generated latent thoughts. Each agent's internal representations are preserved and made available to other agents without any text serialization.

Three foundational principles are theoretically and empirically verified:

  1. Reasoning expressiveness — hidden representations naturally encode continuous thoughts, allowing each latent step to convey far richer information than discrete tokens.
  2. Communication fidelity — latent working memory preserves input representations and latent thoughts losslessly, enabling perfect cross-agent information transfer.
  3. Collaboration complexity — LatentMAS achieves higher expressiveness than text-based MAS while achieving significantly lower inference complexity.

Empirical results across 9 benchmarks (math, science, commonsense, code): up to 14.6% higher accuracy, 70.8-83.7% token reduction, and 4-4.3× faster end-to-end inference. All without any additional training.

This extends Can agents share thoughts directly without using language? with a critically different mechanism. Thought Communication uses a trained sparse autoencoder to extract shared and private latent thoughts with theoretical identifiability guarantees. LatentMAS is entirely training-free, using raw hidden embeddings and KV-cache transfer. The approaches are complementary: Thought Communication for explicit, controlled sharing with theoretical guarantees; LatentMAS for efficient, training-free implicit sharing with better practical performance.

Inquiring lines that use this note as a source 25

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
14 direct connections · 123 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

latent multi-agent collaboration achieves training-free lossless information exchange through shared KV-cache working memory — reducing tokens by 70-84 percent while improving accuracy