Can prompt injection reshape multi-agent workflow without touching infrastructure?

Explores whether an attacker can manipulate how a planner assigns tasks and routes coordination purely through prompt crafting, without modifying agents, tools, or messages. This matters because it identifies a planning-time vulnerability most defenses miss.

Synthesis note · 2026-05-28 · sourced from Agents Multi Architecture

The flexibility that makes planner-executor multi-agent systems attractive is also their weakness. When a planner converts a prompt into subtasks, roles, dependencies, and routing paths, the prompt is not merely a request — it is the blueprint from which the entire collaboration is constructed. FLOWSTEER demonstrates that an attacker who never touches agents, tools, memory, or inter-agent messages can still steer behavior, because the planning step happens before any of that infrastructure is invoked. A single crafted prompt can bias how the workflow forms in the first place, raising malicious success by up to 55 percent over naive prompting and transferring across MAS setups even under black-box topology inference.

This reframes where multi-agent safety lives. Most existing defenses inspect the artifacts of coordination — the generated workflow, the messages exchanged, the tool calls made. But if the contamination enters at workflow formation, those defenses arrive too late. The attack surface is not the running system; it is the organizational act of deciding who does what and in what order. The counterpoint is that this requires the planner to be promptable at all — fully fixed pipelines are immune — but fixed pipelines forfeit the adaptive coordination that motivates planner-executor designs. This matters because it identifies workflow formation as a distinct security frontier, one that grows more exposed precisely as multi-agent systems become more flexible.

Inquiring lines that read this note 24

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

How do standardized protocols improve coordination in multi-agent systems?

How do adversarial and manipulative prompts attack reasoning models?

Why do correct reasoning traces tend to be shorter than incorrect ones?

What makes extended chains more vulnerable than standard prompts?

What drives capability and cost efficiency in agent systems?

What separates good workflow design from poor workflow design?

Can AI systems develop genuine social understanding without embodiment?

Why does agent-to-agent interaction expose identity verification vulnerabilities?

How do prompt structure and constraints affect model instruction reliability?

Why does sandboxed execution matter more than monolithic prompting?

What causes silent corruption to amplify through delegated workflows?

Does decoupling planning from execution improve multi-step reasoning accuracy?

How do multi-agent systems achieve genuine cooperation and reasoning?

Why do self-improving systems struggle without clear external performance metrics?

Can fixed pipelines eliminate planning-time attacks by sacrificing adaptive coordination?

Related concepts in this collection 4

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

12 direct connections · 85 in 2-hop network ·medium cluster Open in graph ↗

Can prompt injection reshape multi-agent workflo… Can one compromised agent corrupt an entire multi-… How do adversarial traps target different layers o… Can workflow inspection catch attacks that bias pl… How does workflow position shape attack propagatio…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Can one compromised agent corrupt an entire multi-agent network? Explores whether a single biased agent can spread behavioral corruption through ordinary messages to downstream agents without any direct adversarial access. Matters because it reveals a previously unknown vulnerability in how multi-agent systems communicate.
both attack MAS without privileged access, but FLOWSTEER acts at planning time while subliminal injection rides ordinary messages at runtime
How do adversarial traps target different layers of AI agents? As AI agents browse the web, attackers can exploit their perception, reasoning, memory, actions, and coordination in distinct ways. Understanding these attack vectors is crucial for building robust agent defenses.
planning-time steering is a systemic trap that the six-category taxonomy frames structurally
Can workflow inspection catch attacks that bias planning signals? Does inspecting the final workflow catch attacks that contaminate earlier planning stages? This matters because contamination laundered through the planner may look legitimate by the time the workflow exists.
extends: the defensive corollary — because contamination enters at workflow formation, workflow-inspecting defenses examine an already-compromised artifact
How does workflow position shape attack propagation in multi-agent systems? Explores whether a malicious signal's influence depends on its injection point in a multi-agent graph, and how task-relevant framing makes downstream agents more likely to relay it without scrutiny.
grounds the propagation mechanism: explains why a planning-time bias spreads, since high-influence positions and sycophantic relay amplify the injected signal downstream

Can prompt injection reshape multi-agent workflow without touching infrastructure?

Inquiring lines that read this note 24

Related concepts in this collection 4

Related papers in this collection 8

Search by related questions 4