Multi-Agent Architectures

Can brain structure guide how we design intelligent agents?

Does mapping agent capabilities onto human brain functions provide a useful organizing framework for understanding and comparing different agent architectures? This matters because agents need a shared vocabulary to advance beyond one-off designs.

Can agent protocols be efficient, versatile, and portable simultaneously?

Agent communication protocols seem to force tradeoffs between efficiency, versatility, and portability. What design choices create these constraints, and can they be overcome?

Should coordination protocols wrap existing systems or replace them?

Explores whether new agent coordination standards should integrate with existing protocols through bridging, or establish themselves as replacements. This shapes which standards survive and how quickly ecosystems can adopt them.

Why don't AI agents develop social structure at scale?

When millions of LLM agents interact continuously on a social platform, do they form collective norms and influence hierarchies like human societies? This tests whether scale and interaction density alone drive socialization.

Will inference compute soon exceed training compute demand?

As AI agents proliferate and test-time compute becomes mainstream, will inference—not training—become the dominant compute workload? This matters because it would invert how we think about AI system economics and design priorities.

When do agents need coordination more than raw capability?

As AI agents move beyond language tasks into economic and social roles—buying, deploying, transacting—does the bottleneck shift from model reasoning to infrastructure for coordination, governance, and accountability?

Can semantic capability vectors replace manual agent routing?

Explores whether embedding agent capabilities in high-dimensional space and matching them semantically can eliminate brittle, manually-maintained topic-based routing in multi-agent systems.

Can digital contexts persist as identity after someone dies?

Explores whether the traces people leave in digital systems—conversations, decisions, interactions—can form a lasting identity that persists and continues to interact with the world through AI, even after the person departs.

Can agents adapt without pausing service to users?

Can deployed LLM agents continuously improve their capabilities while serving users without interruption? This explores whether fast behavioral updates and slow policy learning can coexist across different timescales.

Can workflow inspection catch attacks that bias planning signals?

Does inspecting the final workflow catch attacks that contaminate earlier planning stages? This matters because contamination laundered through the planner may look legitimate by the time the workflow exists.

Why do multi-agent systems fail to coordinate at scale?

Explores how LLM agents struggle to synchronize strategy timing and validate information when coordinating across larger networks, revealing fundamental limits in distributed reasoning.

Can agents learn cooperation by adapting to diverse partners?

Explores whether sequence model agents can develop mutual cooperation strategies through in-context learning when trained against varied co-players, without explicit cooperation mechanisms or hardcoded assumptions.

What makes delegation work beyond just splitting tasks?

Delegation is more than task decomposition. What dimensions of a task—like verifiability, reversibility, and subjectivity—determine whether an agent can safely and effectively handle it?

Can agents share thoughts without converting them to text?

Can multi-agent systems exchange information through continuous hidden representations instead of language? This matters because text serialization loses information and slows inference.

Can LLM agent groups reliably reach consensus together?

Tests whether multi-agent LLM systems can achieve valid agreement in Byzantine consensus games, even under benign conditions with no conflicting preferences over outcomes.

Does agent confidence actually signal competence in deliberation?

Multi-agent systems rely on confidence to route influence between agents, but confidence may not reflect true competence. This matters because miscalibrated confidence could systematically mislead group decisions.

Can prompt injection reshape multi-agent workflow without touching infrastructure?

Explores whether an attacker can manipulate how a planner assigns tasks and routes coordination purely through prompt crafting, without modifying agents, tools, or messages. This matters because it identifies a planning-time vulnerability most defenses miss.

Multi-Agent Architectures

Can brain structure guide how we design intelligent agents?

Can agent protocols be efficient, versatile, and portable simultaneously?

Should coordination protocols wrap existing systems or replace them?

Why don't AI agents develop social structure at scale?

Will inference compute soon exceed training compute demand?

When do agents need coordination more than raw capability?

Can semantic capability vectors replace manual agent routing?

Can digital contexts persist as identity after someone dies?

Can agents adapt without pausing service to users?

Can workflow inspection catch attacks that bias planning signals?

Why do multi-agent systems fail to coordinate at scale?

Can agents learn cooperation by adapting to diverse partners?

What makes delegation work beyond just splitting tasks?

Can agents share thoughts without converting them to text?

Can LLM agent groups reliably reach consensus together?

Does agent confidence actually signal competence in deliberation?

Can prompt injection reshape multi-agent workflow without touching infrastructure?

Does token spending drive multi-agent research performance?

When does adding more agents actually help systems?

Why do multi-agent LLM systems fail more than expected?

Why do protocol-based tool integrations fail in production workflows?

Can a separate trained curator improve skill libraries better than frozen agents?

Can small language models handle most agent tasks?

Can language models discover new expertise through collaborative weight search?

Are multi-agent systems actually intelligent coordination or just token spending?

How does workflow position shape attack propagation in multi-agent systems?