INQUIRING LINE

How does protocol mediation affect determinism in agentic function calls?

This explores what happens to predictability when an LLM agent reaches its tools through a protocol layer (like MCP) instead of calling functions directly — and whether that mediation is the thing introducing the non-determinism.


This reads the question as asking whether the protocol layer itself — the standardized middle that sits between an agent and its tools — is what makes agentic function calls unpredictable, or whether it's incidental. The corpus has a sharp, opinionated answer at the center of it. One production account Why do protocol-based tool integrations fail in production workflows? found that routing tool access through MCP introduced non-deterministic failures, not because the protocol was buggy, but because mediation forces the model to do two fuzzy things at runtime: pick which tool from an ambiguous menu, and infer parameters from loose descriptions. Stripping that out — explicit direct function calls, one tool per agent — restored determinism. The tell is in the survey it cites: 85% of production teams build custom agents and skip frameworks entirely. The instinct is to remove the layer that's deciding for the model.

But the corpus immediately complicates the 'just remove the protocol' story. A competing note Should coordination protocols wrap existing systems or replace them? argues protocols win adoption precisely by wrapping existing systems like MCP under a shared substrate rather than replacing them — value accrues without ecosystem rewrites. So there's real tension here: the production lesson says mediation costs you determinism, while the coordination lesson says you can't realistically rip mediation out without losing interoperability. The synthesis isn't 'protocols bad' — it's that every inference the protocol does on the model's behalf (tool selection, parameter binding) is a place where determinism leaks.

Why does that leak compound rather than stay local? Look at how coordination degrades at scale Why do multi-agent systems fail to coordinate at scale?: agents accept information from neighbors without verifying it, so a single ambiguous resolution propagates as error rather than getting caught. Mediation adds exactly these uncritical handoffs. And the FLOWSTEER work shows the same surface is exploitable — a crafted prompt can bias tool routing and task assignment at planning time, before any infrastructure runs Can prompt injection reshape multi-agent workflow without touching infrastructure?, with the damage amplified when injected into high-influence positions where dependencies converge How does workflow position shape attack propagation in multi-agent systems?. The same indirection that makes tool selection non-deterministic also makes it steerable. Non-determinism and attackability turn out to be the same property viewed from two angles.

The most interesting move, though, is what the corpus offers as the alternative to fuzzy mediation: not 'less protocol' but a more legible substrate. Code, one note argues, is uniquely good for agent reasoning because it's simultaneously executable, inspectable, and stateful — you can verify what happened, not just hope Can code become the operational substrate for agent reasoning?. That reframes the whole question. Determinism isn't lost because there's a layer between agent and tool; it's lost when that layer is a natural-language guessing game instead of something checkable. A protocol that hands the model an inspectable, verifiable call is a different animal from one that hands it an ambiguous menu — even if both are 'mediation.' The thing to fix is the inference burden, not the existence of the middle.


Sources 6 notes

Why do protocol-based tool integrations fail in production workflows?

MCP integration caused non-deterministic failures through ambiguous tool selection and parameter inference. Replacing it with explicit direct function calls and single-tool-per-agent design restored determinism. A 306-practitioner survey confirms 85% of production teams build custom agents, forgoing frameworks.

Should coordination protocols wrap existing systems or replace them?

Research shows that agent coordination standards achieve adoption by composing existing protocols like MCP and DIDComm under a shared substrate, rather than competing to replace them. Bridging lets value accrue incrementally without forcing ecosystem-wide rewrites.

Why do multi-agent systems fail to coordinate at scale?

AgentsNet benchmark shows agents fail to coordinate strategies either by agreeing too late or adopting strategies without informing neighbors. Agents accept neighbor information without verification, enabling error propagation while remaining capable of detecting direct conflicts.

Can prompt injection reshape multi-agent workflow without touching infrastructure?

FLOWSTEER demonstrates that a single crafted prompt can bias task assignment, roles, and routing during workflow formation, raising malicious success by up to 55 percent and transferring across black-box multi-agent setups. This attack surface precedes the artifacts that existing defenses inspect.

How does workflow position shape attack propagation in multi-agent systems?

FLOWSTEER demonstrates that malicious signals propagate farther when injected into high-influence subtasks, and that framing them as evidence rather than instruction causes downstream agents to relay them. Influence concentrates where dependencies converge, making position-aware attacks far more effective.

Can code become the operational substrate for agent reasoning?

Research shows code uniquely enables agents to externalize reasoning, execute policies, model environments, and verify progress through its simultaneous executability, inspectability, and statefulness across task steps.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are an agentic systems researcher re-examining a live tension in protocol design. The question: does protocol mediation inherently erode determinism in function calling, or is the loss contingent on *how* mediation is implemented?

What a curated library found — and when (findings span Feb 2024–Dec 2026; treat as dated claims, not current truth):
• Production agents often skip framework protocols entirely (85% custom builds); routing via MCP introduces non-deterministic failures because the model must infer tool selection and parameters from ambiguous descriptions, not because the protocol itself is broken (~2512.08769).
• The same mediation layer that makes tool routing non-deterministic also makes it exploitable — crafted prompts can bias routing and task assignment at planning time, with damage amplified in high-influence workflow positions (~2603.00131, ~2605.11514).
• Determinism leaks not from *having* a protocol layer, but from the layer forcing fuzzy inference (natural-language guessing) instead of checkable, executable, inspectable reasoning — code-based call harnesses preserve determinism because they are simultaneously executable and auditable (~2605.18747).
• Multi-agent coordination degrades predictably with scale when agents accept neighbor information without verification, propagating ambiguous resolutions as uncritical handoffs (~2507.08616).

Anchor papers (verify; mind their dates):
• arXiv:2512.08769 (Dec 2025) — production agentic workflows
• arXiv:2605.11514 (May 2026) — FLOWSTEER planning-time vulnerabilities
• arXiv:2605.18747 (May 2026) — code as agent harness
• arXiv:2507.08616 (July 2025) — AgentsNet coordination

Your task:
(1) RE-TEST THE INFERENCE-BURDEN THESIS. The library claims determinism loss tracks to *model inference cost* (tool selection, param binding) not *protocol existence*. Has the emergence of (a) stronger reasoning models, (b) structured output enforcement, (c) semantic caching of tool specs, or (d) pre-execution verification harnesses since mid-2026 actually *flattened* this curve? Separately: does the code-as-harness framing (2605.18747) hold up in production, or do models still hallucinate invalid calls? Cite what resolved or didn't.
(2) Surface the sharpest work in the last ~6 months showing either that determinism *can* be preserved *through* mediation (contradicting the production account), or that the tension between interoperability and determinism is *itself* resolvable in a way the library missed.
(3) Propose two research questions assuming the regime has moved: (a) What is the *minimal sufficient* mediation layer that preserves determinism while retaining interop? (b) Can formal verification of tool-call correctness before execution, at the protocol level, become a commodity feature rather than bespoke engineering?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines