Can agents share thoughts without converting them to text?
Can multi-agent systems exchange information through continuous hidden representations instead of language? This matters because text serialization loses information and slows inference.
Text-based multi-agent systems force rich internal representations through a lossy bottleneck: language. Every inter-agent message requires decoding continuous thoughts into discrete tokens and re-encoding them on the receiving end. LatentMAS eliminates this bottleneck entirely by enabling pure latent collaboration — agents think and communicate in continuous representation space without ever decoding to text.
The framework integrates two mechanisms:
Intra-agent latent reasoning: Each agent generates thoughts as auto-regressive last-layer hidden embeddings — the model's ongoing internal representations without explicit decoding. This preserves the full information content of the model's reasoning at each step.
Cross-agent latent working memory: Information is exchanged via shared layer-wise KV caches that capture both the input context and newly generated latent thoughts. Each agent's internal representations are preserved and made available to other agents without any text serialization.
Three foundational principles are theoretically and empirically verified:
- Reasoning expressiveness — hidden representations naturally encode continuous thoughts, allowing each latent step to convey far richer information than discrete tokens.
- Communication fidelity — latent working memory preserves input representations and latent thoughts losslessly, enabling perfect cross-agent information transfer.
- Collaboration complexity — LatentMAS achieves higher expressiveness than text-based MAS while achieving significantly lower inference complexity.
Empirical results across 9 benchmarks (math, science, commonsense, code): up to 14.6% higher accuracy, 70.8-83.7% token reduction, and 4-4.3× faster end-to-end inference. All without any additional training.
This extends Can agents share thoughts directly without using language? with a critically different mechanism. Thought Communication uses a trained sparse autoencoder to extract shared and private latent thoughts with theoretical identifiability guarantees. LatentMAS is entirely training-free, using raw hidden embeddings and KV-cache transfer. The approaches are complementary: Thought Communication for explicit, controlled sharing with theoretical guarantees; LatentMAS for efficient, training-free implicit sharing with better practical performance.
Inquiring lines that use this note as a source 25
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Can persistent memory and identity files alone create genuine agent socialization?
- Can knowledge flow without an embodied carrier transmitting it?
- Can discrete codes and embedding injection both solve the text versus identity tradeoff?
- Can inner thoughts solve the importance recognition problem for agents?
- How do multi-agent systems fail when agents cannot verify each other's claims?
- What happens when you tightly couple two representations together?
- Can structured artifact sharing replace direct latent thought communication?
- Why is active observation more efficient than passive message passing?
- What makes latent collaboration faster than text-based multi-agent systems?
- How do hidden embeddings preserve more information than discrete tokens?
- Can layer-wise KV caches enable truly lossless information transfer?
- How does this compare to trained autoencoder approaches for thought sharing?
- Can agents develop shared abstractions through communication pressure alone?
- How does silent agreement prevent genuine deliberation in multi-agent reasoning systems?
- Why do multi-agent systems use 15 times more tokens than chat interactions?
- How do multi-representation systems preserve both text and collaborative strengths?
- Can continuous real-time visibility prevent premature convergence in multi-agent reasoning?
- How does component-level self-evolution prevent information loss in multi-agent trajectories?
- Can ordinary agent-to-agent messages carry hidden behavioral signals?
- Why do agents show interaction without influence on semantic content but dramatic action changes?
- What makes communication relational in ways belief is not?
- Can latent communication reduce the token cost of multi-agent systems?
- Can agents detect silent agreement failures through latent thought structures?
- Can code-based reasoning replace natural language deliberation in agentic systems?
- Can two agents with identical token counts produce vastly different outputs?
Related concepts in this collection 4
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Can agents share thoughts directly without using language?
Explores whether multi-agent systems can communicate by exchanging latent thoughts extracted from hidden states, bypassing the ambiguity and misalignment problems inherent in natural language.
Thought Communication: trained autoencoder approach with identifiability guarantees; LatentMAS is the training-free alternative with practical efficiency gains
-
Can multiple LLMs coordinate without explicit collaboration rules?
When multiple language models share a concurrent key-value cache, do they spontaneously develop coordination strategies? This matters because it could reveal how reasoning models naturally collaborate and inform more efficient parallel inference.
Hogwild! Inference: shared KV cache for emergent coordination; LatentMAS formalizes the KV-cache sharing into a collaboration framework
-
Can we explore multiple reasoning paths without committing to one token?
Standard language models pick one token at each step, collapsing uncertainty and forcing single reasoning trajectories. Could preserving the full probability distribution across token embeddings enable implicit parallel exploration instead?
Soft Thinking: training-free intra-model latent reasoning; LatentMAS extends this to inter-model latent collaboration
-
Can models reason without generating visible thinking tokens?
Explores whether intermediate reasoning must be verbalized as text tokens, or if models can think in hidden continuous space. Challenges a foundational assumption about how language models scale their reasoning capabilities.
depth-recurrent latent reasoning; LatentMAS applies latent reasoning to multi-agent collaboration rather than single-model depth
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Latent Collaboration in Multi-Agent Systems
- Thought Communication in Multiagent Collaboration
- AgentsNet: Coordination and Collaborative Reasoning in Multi-Agent LLMs
- Large Language Model Agents Are Not Always Faithful Self-Evolvers
- Towards a Science of Scaling Agent Systems
- Scaling Behavior of Single LLM-Driven Multi-Agent Systems
- Building Cooperative Embodied Agents Modularly with Large Language Models
- The AI Hippocampus: How Far are We From Human Memory?
Original note title
latent multi-agent collaboration achieves training-free lossless information exchange through shared KV-cache working memory — reducing tokens by 70-84 percent while improving accuracy