← All posts

Multi-agent systems and the approval problem

When AI agents spawn other AI agents, the human-in-the-loop quietly disappears. Here's why multi-agent architectures need explicit approval gates at every delegation boundary — and what happens when they don't.

The delegation chain nobody thinks about

A typical AI workflow today might look like this: a user asks an orchestrator agent to "deploy the latest build to staging." The orchestrator decides to spawn a deployment agent. The deployment agent spawns a testing agent to verify prerequisites. The testing agent runs a shell command that modifies infrastructure configuration.

Four hops from the human. One approval decision (made by the user at step one, when they said "deploy to staging"). And a shell command at the end that nobody explicitly authorised.

This is the multi-agent approval problem: delegation chains dilute human oversight until it disappears entirely.

Human → Orchestrator → Deployment Agent → Testing Agent → shell: terraform apply ↑ approval ↑ executes here given here (3 hops later)

Why this is fundamentally different from a single agent

With a single agent, the scope of damage is bounded by the scope of the original task. If you ask an agent to "check server health," it shouldn't be able to take down a database — and if it tries, that's clearly out of scope.

Multi-agent systems break this bounded scope in three ways:

1. Scope laundering

Each delegation step can legitimately expand the scope of the task. "Deploy to staging" reasonably implies "run tests first," which reasonably implies "set up test fixtures," which reasonably implies "modify database schema." By the end, actions that would never have been approved directly are happening under the umbrella of the original approval.

2. Trust inheritance without verification

When an orchestrator spawns a sub-agent, that sub-agent typically inherits the credentials and trust level of its parent. There's no attestation that the sub-agent is doing what it claims — a compromised or misbehaving sub-agent can act with full parent authority.

3. Prompt injection amplification

A prompt injection attack against a leaf agent in a multi-agent pipeline can be far more damaging than one against a standalone agent. The injected instruction runs with the authority of the orchestrator — not just the leaf agent. The attack surface compounds with every layer of delegation.

The naive solutions that don't work

The tempting fixes sound reasonable until you think them through:

Give sub-agents fewer permissions. This helps with credential scope, but doesn't solve the approval problem. A sub-agent with narrower credentials can still take destructive actions within its allowed scope. "Read-only database access" still lets you exfiltrate your entire customer list.

Log everything. Logging is necessary but not sufficient. An audit trail tells you what happened after the fact. It doesn't stop the harmful action from occurring, and by the time you're reading logs, the irreversible action has already completed.

Define strict agent scopes upfront. Scopes drift. What starts as "deployment agent can restart services" becomes "deployment agent can restart services and clear caches" and then "deployment agent can restart services, clear caches, and run migrations." Each expansion is individually justified; the cumulative result is unconstrained.

What actually works: approval gates at delegation boundaries

The right model treats each delegation boundary as an approval boundary. When an agent wants to delegate to a sub-agent — or when any agent wants to take an action with side effects — that action requires the same human approval as if the top-level orchestrator had requested it directly.

Key principle

Human approval given at delegation step N does not automatically authorise actions at delegation step N+1. Every consequential action needs its own approval — regardless of how many layers of delegation separate it from the original human instruction.

This isn't as slow as it sounds. Most actions in a multi-agent pipeline are genuinely routine — the kind of thing you'd add to a whitelist after seeing it once. The goal isn't to approve every single command; it's to ensure that novel or high-risk actions surface to a human rather than being automatically authorised by a delegation chain.

Classifying actions across delegation levels

A practical framework for what needs approval in multi-agent systems:

Action typeSingle agentSub-agent (inherited trust)Sub-agent (fresh grant)
Read-only (files, APIs, DBs)Auto-approveAuto-approveAuto-approve
Idempotent writesWhitelistRequire approvalWhitelist after 1 approval
State-changing writesRequire approvalRequire approvalRequire approval
Irreversible actionsRequire approval + confirmRequire approval + confirmRequire approval + confirm
Cross-system actionsRequire approvalBlock unless explicitly grantedRequire approval
Credential use / key accessRequire approvalBlock unless explicitly grantedRequire approval

The key column is "sub-agent (inherited trust)" — these actions require escalated scrutiny precisely because they come from a delegation chain. A sub-agent claiming to act on behalf of an orchestrator that claimed to act on behalf of a user is a weaker chain of authority than a direct user request.

Implementing this with expacti

The expacti approval model maps directly onto multi-agent architectures. Each agent — whether orchestrator or sub-agent — submits commands through the approval channel. The reviewer sees the command, its context, and its risk score. The delegation chain is opaque to the reviewer; what matters is the action being taken.

# Python: orchestrator spawning sub-agent via expacti
from expacti import ExpactiClient

# Each agent has its own expacti connection
orchestrator = ExpactiClient(backend_url=EXPACTI_URL, token=ORCHESTRATOR_TOKEN)
deployment_agent = ExpactiClient(backend_url=EXPACTI_URL, token=DEPLOY_TOKEN)

# When the deployment agent wants to run terraform apply,
# it must get approval — regardless of what the orchestrator approved
result = deployment_agent.run("terraform apply -auto-approve -target=module.staging")
# Reviewer sees this in the approval queue.
# The fact that an orchestrator "approved" the deployment is irrelevant —
# terraform apply is a consequential action that needs its own gate.

For multi-agent pipelines at scale, you'll want to configure different approval policies by agent tier:

# config.toml
[whitelist]
# Orchestrator commands: approve once, whitelist forever
[[whitelist.rules]]
pattern = "deploy-orchestrator *"
type = "glob"
risk_level = "low"

# Leaf agent commands: always require fresh approval for state changes
[[whitelist.rules]]
pattern = "terraform *"
type = "glob"
risk_level = "high"
requires_fresh_approval = true

The meta-problem: agents approving agents

There's a tempting shortcut: have the orchestrator act as the "reviewer" for sub-agent commands. Orchestrators can evaluate risk and make approval decisions quickly, without human latency.

This solves the wrong problem. The value of human approval isn't speed — it's independence. An orchestrator and its sub-agents share the same prompt context, the same potential for prompt injection, and the same misaligned goal interpretation. Having an orchestrator approve a sub-agent command provides no additional safety signal; it's just bureaucracy for its own sake.

Effective oversight requires an independent decision-maker who wasn't party to the original context. That's what makes human approval meaningful — not just that a review occurred, but that the reviewer brought independent judgment.

Automated approval pipelines (agents reviewing agents) can be useful for rate-limiting and preliminary filtering. But they shouldn't replace human review for high-risk actions — they should complement it by reducing the review queue to only the commands that genuinely need human judgment.

Design patterns for safer multi-agent systems

A few concrete patterns that apply the approval-gate model to multi-agent architectures:

Explicit authority grants

When spawning a sub-agent, explicitly enumerate what actions it's authorised to take — don't inherit ambient authority. If an orchestrator has rights to deploy and rights to modify configs, the deployment sub-agent should only inherit the deployment rights. Config modification requires re-escalation.

Action attestation

Require sub-agents to attest which top-level task they're executing on behalf of. This creates a traceable link from the original user request to every leaf action — and makes it easier for reviewers to understand context when evaluating a command.

Scope checkpoints

Build explicit checkpoints in long-running multi-agent pipelines where a human must confirm before the pipeline proceeds to the next phase. "Tests passed. Ready to apply migrations. Continue?" is more valuable than a silent auto-approve that only surfaces in logs after something goes wrong.

Anomaly escalation

Multi-agent systems can behave unexpectedly in ways that none of the individual agents are "wrong" about. Monitor for action patterns that don't match the original task description — and require fresh human review when the pipeline appears to be doing something the original user didn't ask for.

The uncomfortable truth

Multi-agent systems are powerful precisely because they can accomplish complex tasks by decomposing them into simpler sub-tasks. But this decomposition also decomposes accountability. The more sophisticated your agent pipeline, the more important it is to have explicit approval gates — because the more opportunities there are for things to go subtly wrong in ways that no individual component is responsible for.

The goal isn't to slow down your agent pipelines. It's to make sure that when something goes wrong — and eventually, it will — you have a clear record of what was approved, by whom, and when. And that the actions that had the most potential for harm had the highest scrutiny.

Add approval gates to your multi-agent pipeline

expacti integrates with any agent framework. Commands are submitted for approval before execution — your pipeline pauses, a human decides, and only approved actions run.

Start for free →