2026-03-26 Security Engineering

Limiting blast radius: how to scope what your AI agent can touch

Principle of least privilege is the oldest rule in security. Your AI agent is probably violating it right now.

Every security engineer knows the principle of least privilege: grant only the minimum access required to do the job. It's been core doctrine since the 1970s. And yet, when teams deploy AI coding agents today, they routinely hand them unrestricted shell access, full database credentials, and production SSH keys.

The reasoning is understandable: restricting access would make the agent less useful. But "less useful" is a different problem than "capable of deleting your production database at 3am." The question isn't whether to restrict — it's how to restrict in a way that doesn't kill productivity.

Why the blast radius problem is real

The blast radius of an AI agent failure scales with its permissions. An agent with read-only access to a staging environment can, at worst, leak some data. An agent with root on your production cluster can destroy everything.

And agents do fail. Not necessarily through malice — through confusion, hallucination, misunderstood context, or being passed a prompt that leads them somewhere unexpected. The question isn't whether your agent will ever do something wrong; it's what happens when it does.

"The cost of an incident equals (probability of failure) × (blast radius). You can't easily control probability, but you can absolutely control blast radius."

Least privilege is blast radius control. Most teams focus on probability (better prompts, more testing, careful model selection) and ignore blast radius entirely. That's backwards.

The four dimensions of agent scope

When thinking about what an AI agent can touch, there are four dimensions to scope:

1. System access

What can the agent execute on? This is the most obvious dimension: which servers, which cloud accounts, which Kubernetes namespaces. The principle here is simple — give the agent access to the smallest environment that lets it do its job.

A coding agent reviewing PRs doesn't need production access. A deployment agent doesn't need access to your data science cluster. An on-call triage agent doesn't need write access to anything — it needs read access to everything relevant, and write access to exactly one thing (your incident tracker).

      Scoping by environment
      Development: unrestricted is usually fine — the worst case is a broken dev box
Staging: read/write but no credentials for external services
Production: treat like giving a junior engineer root — in other words, don't, unless reviewed

    

2. Command scope

Within a given system, what can the agent actually run? This is where whitelist-based approaches live. Instead of saying "the agent can run anything on this server," you say "the agent can run these specific commands."

The naive whitelist is just a list of allowed binaries: git, docker, kubectl. But binary-level whitelisting is too coarse. kubectl delete and kubectl get are very different commands that happen to invoke the same binary.

A better whitelist operates at the command level — and distinguishes between read-only operations (always safe, no approval needed) and mutating operations (require explicit approval).

Category	Examples	Risk	Recommended policy
Read-only	`git log`, `kubectl get`, `df -h`	Low	Auto-approve via whitelist
Reversible writes	`git commit`, `docker restart`, `kubectl rollout restart`	Medium	Approve once, whitelist pattern
Irreversible writes	`kubectl delete`, `rm -rf`, `DROP TABLE`	High	Always require human review
Privilege escalation	`sudo`, `su`, `chmod 777`	Critical	Block or require 2-reviewer approval

3. Data scope

What data can the agent read or modify? This is often overlooked because data access is less visible than system access. But an agent with unrestricted database access can do enormous damage — not through destructive commands, but through data exfiltration or corruption.

Practically: create dedicated database users for agents with the minimum required permissions. If an agent only needs to read from user_events, don't give it a connection string that can write to users. If it doesn't need access to PII columns, use column-level grants.

4. Time scope

This one is underused: limiting when the agent can act. A deployment agent probably shouldn't be pushing changes to production at 2am without extra scrutiny. A maintenance agent running during business hours gets reviewed immediately; the same agent running overnight might have its timeout policy set to auto-deny instead of auto-approve.

Time-based policies are a form of anomaly detection. Most legitimate operations happen within expected hours. If your agent is trying to do something unusual at an unusual time, that's a signal — not necessarily malicious, but worth a human look.

The problem with hard restrictions

Here's the tension: if you restrict the agent too tightly, it can't do its job, so developers route around it. They give the agent broader access "just for this task," forget to remove it, and you end up with worse security than if you'd never restricted it at all.

The answer isn't to give up on restrictions — it's to make them dynamic rather than static.

Static restrictions say: "this agent can never run X." Dynamic restrictions say: "this agent needs human approval to run X, but once approved in this context, similar commands are fast-tracked." The agent stays useful; the human stays in the loop for novel or risky operations.

This is the difference between an access control list (binary: can/can't) and an approval workflow (graduated: auto-approve known-safe, review unknown, block dangerous). The whitelist is the persistent memory of what's already been reviewed and approved.

A practical framework

When scoping a new AI agent deployment, work through this list:

Define the agent's job in one sentence. If you can't do this, the agent probably has too broad a mandate.
List every category of action it needs to take. Not individual commands — categories. "Read git history," "create commits," "run tests," "restart services."
For each category, classify: read-only, reversible, irreversible. Irreversible operations always need human review, at least initially.
Start with manual approval for everything. Build up the whitelist through real usage. After a week, you'll know which commands are genuinely routine and which ones need eyes on them.
Set anomaly thresholds. What's unusual for this agent? Unusual times, unusual targets, unusual command frequency. Flag these for review even if the individual command looks safe.

What this looks like in practice

Here's an example: a CI/CD agent responsible for deployments. Its job in one sentence: "take a passing build and get it to production."

Its action categories:

Pull and inspect the build artifact — read-only, auto-approve
Run smoke tests against staging — read-only, auto-approve
Tag the release in git — reversible, auto-approve after initial review
Deploy to staging — reversible (can roll back), require approval first time
Deploy to production — irreversible in terms of exposure, always require review
Roll back production — reversible but urgent, approve with 1-minute timeout

The agent can run most of its workflow automatically. The human-in-the-loop moments are exactly where you'd want them: production deploys and rollbacks. Everything else is noise-free automation.

After the first month, you review the audit log. How often did the reviewer say no? If it's never, you might be approving too quickly — the review is becoming a rubber stamp, which is as dangerous as no review. If it's always, the agent's scope is too broad. The approval rate tells you whether your scoping is calibrated correctly.

The audit trail as scope feedback

One underappreciated benefit of human-in-the-loop approval: the audit trail tells you, over time, whether your scoping decisions were correct.

If the same command is being approved hundreds of times without ever being denied, it should be in your whitelist — you're adding friction with no security benefit. If a command is being denied regularly, that's a signal the agent's scope is too broad, or its prompting needs work.

The approval workflow isn't just a safety mechanism — it's a feedback loop. Over time, it naturally drives toward the right level of automation: high-confidence routine operations are whitelisted, genuinely risky operations stay reviewed, and you have evidence for both decisions.

Scope your agents, then automate safely

Expacti gives you fine-grained control over what your AI agents can do — with a whitelist engine, risk scoring, anomaly detection, and full audit trail. Start with manual review for everything; let the data tell you what's safe to automate.

Get started free →