Can agents share thoughts without converting them to text?

Can multi-agent systems exchange information through continuous hidden representations instead of language? This matters because text serialization loses information and slows inference.

Synthesis note · 2026-02-23 · sourced from Agents Multi Architecture

Text-based multi-agent systems force rich internal representations through a lossy bottleneck: language. Every inter-agent message requires decoding continuous thoughts into discrete tokens and re-encoding them on the receiving end. LatentMAS eliminates this bottleneck entirely by enabling pure latent collaboration — agents think and communicate in continuous representation space without ever decoding to text.

The framework integrates two mechanisms:

Intra-agent latent reasoning: Each agent generates thoughts as auto-regressive last-layer hidden embeddings — the model's ongoing internal representations without explicit decoding. This preserves the full information content of the model's reasoning at each step.

Cross-agent latent working memory: Information is exchanged via shared layer-wise KV caches that capture both the input context and newly generated latent thoughts. Each agent's internal representations are preserved and made available to other agents without any text serialization.

Three foundational principles are theoretically and empirically verified:

Reasoning expressiveness — hidden representations naturally encode continuous thoughts, allowing each latent step to convey far richer information than discrete tokens.
Communication fidelity — latent working memory preserves input representations and latent thoughts losslessly, enabling perfect cross-agent information transfer.
Collaboration complexity — LatentMAS achieves higher expressiveness than text-based MAS while achieving significantly lower inference complexity.

Empirical results across 9 benchmarks (math, science, commonsense, code): up to 14.6% higher accuracy, 70.8-83.7% token reduction, and 4-4.3× faster end-to-end inference. All without any additional training.

This extends Can agents share thoughts directly without using language? with a critically different mechanism. Thought Communication uses a trained sparse autoencoder to extract shared and private latent thoughts with theoretical identifiability guarantees. LatentMAS is entirely training-free, using raw hidden embeddings and KV-cache transfer. The approaches are complementary: Thought Communication for explicit, controlled sharing with theoretical guarantees; LatentMAS for efficient, training-free implicit sharing with better practical performance.

Inquiring lines that read this note 28

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

How should memory consolidation strategies shape agent performance over time?

Can persistent memory and identity files alone create genuine agent socialization?

Is embodied interaction necessary for language meaning and genuine agency?

Can knowledge flow without an embodied carrier transmitting it?

Why do semantic similarity and task relevance diverge in vector embeddings?

Why do reward structures fail to shape long-term agent learning?

Can inner thoughts solve the importance recognition problem for agents?

Can debate mechanisms prevent silent agreement on wrong answers in multi-agent reasoning?

Can AI-generated outputs constitute genuine knowledge or valid claims?

What happens when you tightly couple two representations together?

How do standardized protocols improve coordination in multi-agent systems?

How do multi-agent systems achieve genuine cooperation and reasoning?

Does recurrence enable reasoning capabilities that fixed-depth transformers cannot achieve?

Can layer-wise KV caches enable truly lossless information transfer?

What drives capability and cost efficiency in agent systems?

Does conversational format create illusions of genuine AI communication?

Why do agents show interaction without influence on semantic content but dramatic action changes?

How do language models establish social grounding in human dialogue?

What makes communication relational in ways belief is not?

Why should disagreement be treated as signal in collaborative reasoning?

Can agents detect silent agreement failures through latent thought structures?

How can AI agents autonomously learn and transfer skills across tasks?

Can agents teach each other skills without human supervision?

How should agents balance memory condensation to optimize context efficiency?

How should embedding model speed constrain agent memory system design?

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

14 direct connections · 121 in 2-hop network ·medium cluster Open in graph ↗

Can agents share thoughts without converting the… Can agents share thoughts directly without using l… Can multiple LLMs coordinate without explicit coll… Can we explore multiple reasoning paths without co… Can models reason without generating visible think…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Can agents share thoughts directly without using language? Explores whether multi-agent systems can communicate by exchanging latent thoughts extracted from hidden states, bypassing the ambiguity and misalignment problems inherent in natural language.
Thought Communication: trained autoencoder approach with identifiability guarantees; LatentMAS is the training-free alternative with practical efficiency gains
Can multiple LLMs coordinate without explicit collaboration rules? When multiple language models share a concurrent key-value cache, do they spontaneously develop coordination strategies? This matters because it could reveal how reasoning models naturally collaborate and inform more efficient parallel inference.
Hogwild! Inference: shared KV cache for emergent coordination; LatentMAS formalizes the KV-cache sharing into a collaboration framework
Can we explore multiple reasoning paths without committing to one token? Standard language models pick one token at each step, collapsing uncertainty and forcing single reasoning trajectories. Could preserving the full probability distribution across token embeddings enable implicit parallel exploration instead?
Soft Thinking: training-free intra-model latent reasoning; LatentMAS extends this to inter-model latent collaboration
Can models reason without generating visible thinking tokens? Explores whether intermediate reasoning must be verbalized as text tokens, or if models can think in hidden continuous space. Challenges a foundational assumption about how language models scale their reasoning capabilities.
depth-recurrent latent reasoning; LatentMAS applies latent reasoning to multi-agent collaboration rather than single-model depth

Can agents share thoughts without converting them to text?

Inquiring lines that read this note 28

Related concepts in this collection 4

Related papers in this collection 8

Search by related questions 4