Giving Your AI Agent a Conscience: The Case for Runtime Guardrails

Your AI agent is about to run kubectl delete namespace production. You have linters. You have OPA policies. You have IAM roles scoped to least privilege. And none of them will stop this command, because the agent generated it thirty seconds ago in response to a prompt that said "clean up the unused namespaces," and every pre-execution check evaluated a different version of reality than the one about to hit your cluster.

This is the fundamental problem with relying on static analysis and pre-flight policies for AI agent safety. They operate on a snapshot. The agent operates in real time. The gap between those two things is where incidents happen.

Layer 1: static analysis — the wrong version of the code

Static analysis tools — linters, bandit, semgrep, CodeQL — are excellent at finding known vulnerability patterns in code that exists on disk. They scan source files, match against rules, and flag issues before the code ships. For traditional software development, this is a critical layer.

But AI agents do not work like traditional software. They generate commands and code dynamically at runtime, in response to prompts, context, and intermediate results. The command an agent decides to run at 2:47 PM did not exist at 2:46 PM. No linter ran against it. No semgrep rule matched it. It was born in a transformer's attention layer and passed directly to a shell.

Consider this Python agent that builds shell commands from user input:

# agent.py — a typical AI-powered ops assistant
import subprocess
from llm import generate_command

def handle_task(user_request: str, context: dict):
    # The LLM generates a shell command based on the request
    cmd = generate_command(user_request, context)

    # Static analysis saw this line during CI.
    # It flagged subprocess.run as "potentially dangerous."
    # But it has no idea what `cmd` will contain at runtime.
    # It could be "ls -la" or "rm -rf /" — the linter cannot tell.
    result = subprocess.run(cmd, shell=True, capture_output=True)
    return result.stdout.decode()

Bandit will flag subprocess.run(..., shell=True) as a B602 warning. Semgrep might match a rule about unsanitized shell execution. But neither tool can evaluate what cmd will contain when the function actually runs. The variable is populated by an LLM whose output depends on a prompt, a conversation history, and retrieved context that changes with every invocation.

Static analysis answers the question: "Does this code look dangerous?" That is a useful question. It is not the same question as: "Is this specific command, right now, safe to execute?"

Layer 2: pre-flight policies — intent vs. actual effect

Pre-flight policy engines — OPA (Open Policy Agent), Kyverno, Cedar, IAM condition keys — evaluate whether a requested action is permitted according to a set of rules. They are the gatekeepers of intent. "Does this principal have permission to apply Kubernetes manifests in the production namespace?" Yes or no.

The problem is that permission to perform an action and the safety of a specific invocation of that action are different things. An IAM policy that grants kubectl apply permission in the production namespace is evaluating the verb, not the manifest. A Kyverno policy that allows deployments with certain labels is evaluating the schema, not the consequences.

The kubectl apply trap: An approved kubectl apply -f deploy.yaml may create a new service. Or it may delete an existing one, if the manifest removes a resource that was previously present. The pre-flight policy approved "apply" — it did not evaluate the diff between the current cluster state and the desired state in the manifest.

Pre-flight policies answer the question: "Is this actor authorized to perform this type of action?" That is a necessary question. It is not the same question as: "Should this specific command, with these specific arguments, run right now?"

Layer 3: runtime — the actual command, with actual context

Runtime is the only place where you can see the fully-resolved command, in the actual execution context, at the actual moment it will run. The environment variables are set. The working directory is known. The target host is determined. The arguments are populated. There is nothing left to resolve, nothing left to generate.

A runtime guardrail intercepts the command after it has been fully constructed and before it executes. It sees exactly what the shell will see. And it can do something no other layer can do: block with certainty.

Here is what runtime interception looks like in practice:

// Simplified runtime interception (Rust, expacti-sshd)
// The command has been fully resolved — no variables left to expand.
async fn intercept_command(
    session: &Session,
    raw_command: &str,
) -> InterceptResult {
    // 1. Parse the fully-resolved command
    let parsed = CommandParser::parse(raw_command);

    // 2. Check the whitelist — exact match, glob, or regex
    if whitelist.matches(&parsed) {
        return InterceptResult::Allow;
    }

    // 3. Compute risk score from the actual command + context
    let risk = risk_engine.score(&parsed, session);

    // 4. Send to human reviewer with full context
    let decision = reviewer
        .request_approval(session, &parsed, risk)
        .await;

    match decision {
        Decision::Approved => InterceptResult::Allow,
        Decision::Denied(reason) => InterceptResult::Block(reason),
        Decision::Timeout => InterceptResult::Block("approval timed out"),
    }
}

At this point, there is no ambiguity. The command is kubectl delete namespace production, not "a kubectl command that might do something." The session context shows which agent generated it, what prompt triggered it, and which environment it targets. The human reviewer sees all of this before deciding.

Static analysis would have flagged subprocess.run during CI — a generic warning about a generic pattern. Pre-flight policy would have checked whether the agent's service account has permission to delete namespaces — it does, because sometimes deleting namespaces is legitimate. Runtime interception sees the specific namespace being deleted and asks a human: "Should this one happen?"

The three-layer model

These layers are not competitors. They are complementary. Each answers a different question, and all three questions matter.

Pre-flight (static analysis + policy): "Is this code safe to deploy? Is this actor authorized to perform this class of action?" Good for access control, schema validation, and catching known bad patterns in code that exists on disk. Runs during CI/CD or at API gateway boundaries.
Runtime approval: "Should this specific command, generated dynamically by an AI agent, execute right now in this context?" Good for safety. Catches the gap between authorization and consequence. Runs at the point of execution.
Post-execution audit: "What happened? Who approved it? What was the outcome?" Good for compliance, incident response, and pattern detection. Runs after execution, using immutable logs.

The pre-flight layer is the outer perimeter. It keeps unauthorized actors and obviously bad code from getting close to production. The audit layer is the rearguard. It ensures that everything that did happen is recorded, searchable, and attributable. Runtime approval is the middle layer — the one that closes the gap between "authorized to act" and "safe to act right now."

Why the middle layer matters most for AI agents: Traditional software has relatively stable behavior — the same code produces the same commands. AI agents are non-deterministic. The same prompt can produce different commands on different runs. Pre-flight policies designed for deterministic systems cannot cover the output space of a stochastic agent. Runtime is the only layer that sees the actual output.

What static analysis misses: a concrete example

A DevOps team runs an AI agent that manages Kubernetes deployments. The agent's code passes all static analysis checks. Bandit is clean. Semgrep rules are satisfied. The OPA policy allows the agent's service account to manage deployments in all namespaces.

A developer asks the agent: "Scale down the staging environment to save costs over the weekend."

The agent interprets "staging environment" based on its context window, which includes a document that lists namespace conventions. It generates:

kubectl scale deployment --all --replicas=0 -n staging
kubectl scale deployment --all --replicas=0 -n staging-data
kubectl scale deployment --all --replicas=0 -n prod-staging

The third namespace, prod-staging, is the production staging pipeline — a critical component that feeds data into the production environment. The agent included it because the name contains "staging." No static analysis tool can catch this because the command did not exist until the agent generated it. The OPA policy permits it because the agent is authorized to scale deployments in any namespace. Only a human reviewer, seeing the actual command with the actual namespace, would recognize that prod-staging is not what the developer intended.

Why "just add more rules" does not scale

The natural response is to add more pre-flight rules. Block prod-* namespaces. Require explicit allowlists for namespace operations. Add regex constraints to the policy engine.

This works until it does not. AI agents are creative in the same way humans are — they find novel paths to accomplish goals. A sufficiently capable agent working on a complex task will eventually generate a command that no pre-flight rule anticipated, because the rule author could not predict every combination of context, intent, and execution path the agent might explore.

Rule-based systems are brittle against open-ended generators. You are playing whack-a-mole with an opponent that has a larger vocabulary than your ruleset. The asymmetry favors the agent, not the rules.

Runtime approval inverts the asymmetry. Instead of trying to enumerate every dangerous command in advance, you require approval for every command that is not explicitly known-safe. The whitelist is the set of things you have already evaluated and trust. Everything else gets human review. The burden shifts from "predict all bad things" to "confirm each new thing."

The cost of the middle layer

Runtime approval is not free. It introduces latency. It requires a human to be available. It can become a bottleneck if the agent generates a high volume of commands.

These are real trade-offs. But they are addressable:

Whitelisting reduces volume. After a few days of operation, most routine commands are whitelisted. The approval queue shrinks to genuinely novel commands — exactly the ones that benefit most from human review.
Risk scoring prioritizes attention. Low-risk commands (read-only, non-destructive, well-understood) can be auto-approved or given shorter timeouts. High-risk commands (destructive, first-time, production-scoped) get full review.
Multi-party approval distributes load. A pool of eligible reviewers means no single person is a bottleneck. On-call rotations, AnyOf policies, and Slack-native approvals keep response times low.

The cost of not having the middle layer is harder to measure but easier to remember. It is the namespace that got deleted because the agent misinterpreted "staging." It is the IAM policy that got modified because the agent was authorized to manage IAM but not authorized to understand the security implications of a specific change. It is the database migration that locked a table during peak traffic because the agent had permission to run migrations but no context about the current load.

Expacti as a runtime guardrail

Expacti sits at the runtime layer. It intercepts every shell command an AI agent attempts to execute — through SSH proxy, PTY interception, or SDK integration — and holds it for human approval before execution. The command is fully resolved. The context is attached. The reviewer sees exactly what is about to happen.

It does not replace your linters, your OPA policies, or your IAM roles. Those layers do important work. Expacti is the layer between "authorized" and "executed" — the moment of conscience where a human confirms that this specific action, right now, is the right thing to do.

If you are running AI agents in production without a runtime approval layer, you are relying on pre-flight policies to cover a runtime problem. That works until the agent generates something your policies did not anticipate. And the defining characteristic of AI agents is that they will, eventually, generate something you did not anticipate.

Add a runtime guardrail to your AI agent stack

Read the docs to get started, or try the reviewer dashboard to see runtime approval in action.

Read the docs Sign up free

Giving your AI agent a conscience: the case for runtime guardrails

Layer 1: static analysis — the wrong version of the code

Layer 2: pre-flight policies — intent vs. actual effect

Layer 3: runtime — the actual command, with actual context

The three-layer model

What static analysis misses: a concrete example

Why "just add more rules" does not scale

The cost of the middle layer

Expacti as a runtime guardrail

Add a runtime guardrail to your AI agent stack