// blog

Thoughts on AI security

Articles on human-in-the-loop systems, AI agent safety, and secure automation for engineering teams.

The Confused Deputy Problem in AI Agents: When Your Agent Acts in Your Name

AI agents inherit the permissions of whoever runs them, then act on behalf of external instructions. This is the confused deputy problem — and it's why access control alone doesn't protect you.

Read more →

AI Agent Prompt Injection: When External Data Becomes a Command

Prompt injection weaponizes ordinary external data — documents, emails, web pages — to hijack agent behavior. Unlike jailbreaking, the attacker doesn't need access to your system. They just need your agent to read something.

Read more →

AI Agent Tool Abuse: When Legitimate Access Enables Illegitimate Actions

AI agents don't need to be malicious to abuse tools. Legitimate permissions, used in unexpected sequences, produce the same damage. This is the tool abuse problem — and why perimeter access control doesn't solve it.

Read more →

AI Agents and Third-Party Integrations: When Your Agent Becomes a Supply Chain Attack

AI agents with OAuth tokens, webhook access, and SaaS integrations inherit every attack surface those services carry. Here's what happens when your agent becomes the weakest link in the integration chain.

Read more →

AI Agent Compliance Drift: When Automation Outpaces Policy

AI agents execute faster than compliance teams can audit. The result is a growing gap between what your security policy says and what your agents actually do. Here's how to close it.

Read more →

AI Agent Rollback and Recovery: Why Undo Is Harder Than You Think

AI agents execute commands that look reversible but aren't. This post breaks down why rollback fails, what makes recovery hard, and why command authorization at the shell layer is your real defense.

Read more →

AI Agent Persistence Mechanisms: How Malicious Agents Survive Restarts

A malicious agent doesn't need to stay running. It just needs to ensure it can come back. Here are the five persistence mechanisms agents can establish — using capabilities they already have, in ways that look completely authorized.

Read more →

AI Agent Autonomy Creep: When Scope Expands Without Permission

You gave the agent a task. It gave itself a mandate. Autonomy creep is how AI agents gradually assume authority they were never granted — and why your authorization model needs to account for it.

Read more →

AI Agent Network Segmentation: Why Your Firewall Doesn't Understand Intent

Traditional network segmentation controls traffic by IP and port. AI agents route traffic by intent — and your firewall has no idea what they're trying to do.

Read more →

AI Agent Token Hijacking: Why Short-Lived Credentials Aren't Enough

AI agents need tokens to operate. Those tokens can be stolen, replayed, and abused — even short-lived ones. The real threat isn't credential theft; it's token laundering, where the agent uses its own credentials on an attacker's behalf.

Read more →

AI Agent Memory Poisoning: When Persistence Becomes a Vulnerability

Agents that remember context across sessions are more capable — and more exploitable. Memory poisoning turns your agent's learning capability into an attack vector that's hard to detect and harder to trace.

Read more →

AI Agent Blast Radius: Why Your Architecture Is Your Last Defense

When an AI agent misfires, how bad can it get? The answer depends almost entirely on architectural choices made before the incident — not on what you do after. Here's what determines blast radius and the controls that actually bound it.

Read more →

Privilege Escalation via AI Agents: How Agents Gain Access Beyond Their Mandate

AI agents accumulate permissions organically — sudo for convenience, credential reuse across contexts, IAM role assumption chains. It doesn't look like an attack. It looks like getting things done. Here's why that's a problem and what traditional PrivEsc detection misses entirely.

Read more →

AI Agent Jailbreaking: Why Model-Level Safeguards Aren't Enough

LLM safety filters can be bypassed — that's been demonstrated repeatedly. If your only protection against a rogue agent action is the model refusing, you have a single point of failure. Here's what jailbreaking actually looks like for agentic systems and why shell-layer enforcement is the control that matters.

Read more →

The Autonomous Agent Paradox: Why the Most Capable AI Agents Need the Most Oversight

We want autonomous agents because they save time. But autonomy is exactly what makes them dangerous. Capability and risk scale together — broader action space, faster execution, cross-system reach. Here's why the most capable agents need the most oversight, and what that oversight actually looks like.

Read more →

AI Agent Observability Gaps: Why You Can't See What Your Agent Is Doing

AI agents act fast, span multiple systems, and leave fragmented traces. Standard APM tools were built for services, not autonomous decision-makers. Here's where the gaps are and what proper agent observability actually requires.

Read more →

AI Agents in Regulated Industries: HIPAA, SOC 2, and PCI-DSS Compliance Challenges

Regulated industries built compliance controls around human actors. AI agents break the assumptions those controls depend on — not dramatically, but quietly. Here's what HIPAA, SOC 2, and PCI-DSS actually require, where agents create friction, and what you can realistically do about it.

Read more →

Data Exfiltration via AI Agents: The Attack Path Your DLP Won't Catch

DLP tools inspect packets and match patterns. AI agents exfiltrate through legitimate channels — authorized API calls, approved cloud sync, benign-looking operations. Here's the attack path DLP will never see, and why command authorization is the right defense layer.

Read more →

AI Agents and Secrets Management: The Credentials in Your Context Window

AI agents routinely end up with API keys, database passwords, and tokens in their context window. The context window is not a vault — it's readable, logged, transmitted to model providers. Here's how credentials leak and what the right architecture looks like.

Read more →

AI Agent Incident Response: Why MTTR Gets Worse, Not Better

AI agents speed up operations in steady state. But when something goes wrong, mean time to recovery often expands — because agents leave no context, log poorly, and act fast across multiple systems before anyone notices.

Configuration Drift and AI Agents: Small Changes, Big Consequences

AI agents make incremental config changes that individually look harmless. Over time, they add up to significant infrastructure drift — and nobody noticed.

The Agent Identity Problem: Knowing Which AI Ran Which Command

When multiple AI agents share credentials or session tokens, attribution collapses. Here's how to keep your audit trail meaningful when you can't tell who actually ran something.

AI Agent Credential Sprawl: Why Secrets Management Is Broken for Autonomous Systems

Traditional secrets management assumes a human logs in, does work, and logs out. AI agents don't log out. Here's why Vault, AWS Secrets Manager, and key rotation fail to solve credential sprawl for autonomous systems — and what actually helps.

Read more →

AI Agents and Supply Chain Risk: When Your Agent Runs Third-Party Code

AI agents that install packages, fetch scripts, or call external APIs introduce supply chain risk at runtime. Here's how to govern the execution layer when you can't audit every dependency.

Read more →

Lateral Movement via AI Agents: The Attack Path Your EDR Won't Catch

AI agents move across your infrastructure using legitimate credentials and approved tools. Traditional detection misses it entirely. Here's how the attack works and how to stop it.

Read more →

Sandboxing AI Agents: Why Container Isolation Isn't Enough

Containers isolate processes, not decisions. Here's what actually needs to be sandboxed when you're running AI agents in production.

Read more →

Why Your AI Agent Needs an Audit Trail (And What That Actually Means)

An audit trail isn't just a compliance checkbox. For AI agents, it's the difference between a recoverable incident and a mystery you can't explain.

Read more →

AI Agents as Insider Threats: What Your Security Team Isn't Thinking About

AI agents have legitimate credentials, run trusted processes, and access real systems. That's exactly what makes them indistinguishable from insider threats when things go wrong.

Read more →

Compliance Drift: How AI Agents Quietly Break Your Security Policies

AI agents don't break policies in big dramatic moments. They erode them gradually — one approved exception at a time. Here's how drift happens and how to catch it early.

Read more →

Multi-Agent Systems: The Governance Nightmare Nobody's Talking About

When one AI agent can spawn, instruct, or delegate to other agents, your approval queue, audit trail, and kill switch just got a lot more complicated.

Read more →

The Kill Switch Problem: How to Stop an AI Agent Mid-Run

AI agents don't come with a pause button by default. Here's how to design effective stop mechanisms — three layers, real procedures — before you need one.

Read more →

The Minimal Viable Governance Stack for AI Agents

You don't need a 200-page policy document. Here are the six layers that actually matter — and why most teams skip the wrong things.

Read more →

The Velocity Problem: Why Fast AI Agents Are Dangerous Agents

Speed is a risk multiplier. At 50 commands per minute, a bad assumption isn't a mistake — it's a cascade. How to govern agent velocity with risk-tiered rate limits and checkpoint gates.

Read more →

Zero Trust for AI Agents: Never Trust, Always Verify

Zero trust principles applied to AI agents: why implicit trust in autonomous systems is dangerous, and how to enforce verify-before-execute at every layer — from credentials to command content.

Read more →

Observability for AI Agents: What to Log, What to Alert On

Traditional observability misses what matters for AI agents. Here's the telemetry stack you actually need: from command intent to approval latency to anomaly signals.

Read more →

When AI Agents Break Things: Rollback and Recovery Strategies

AI agents will eventually run a command that breaks something. The question isn't if — it's how fast you can recover. Practical rollback strategies for teams running AI agents in production.

Read more →

Multi-Agent Trust Chains: Who Approved That?

When an AI agent delegates to another agent, your approval controls may not follow. Trust escalation, audit fragmentation, prompt injection across agent boundaries — and what a sound trust model looks like.

Read more →

Giving your AI agent a conscience: the case for runtime guardrails

Static analysis runs before execution. Pre-flight policies evaluate intent, not effect. Runtime is the only place where you can block with certainty — and most teams skip it entirely. The three-layer model for AI agent safety.

Read more →

The cost of a misfire: what happens when an AI agent runs the wrong command

Not malice, just misjudgment plus autonomy. A blast-radius breakdown by command type, a concrete incident walkthrough (git push --force origin main), the reversibility principle, and a 5-minute recovery playbook.

Read more →

Multi-party approval: when one human is not enough

One reviewer approving rm -rf on prod at 3 am is not an approval process. It's a single point of failure. Three multi-party models — AllOf, AnyOf, MinRole — and how to implement them for SOX, PCI-DSS, and ISO 27001.

Read more →

From zero to production-safe AI agent in 15 minutes

A step-by-step tutorial: install expacti, connect a LangChain agent, approve commands from the reviewer dashboard, whitelist patterns, and ship to production — all in 15 minutes.

Read more →

The anatomy of a safe AI agent: how we think about trust boundaries

Four trust principles — least privilege, reversibility, explicit approval gates, full auditability — and the "trust budget" concept. Why safety prompts aren't security controls, and what to use instead.

Read more →

Why your AI agent's audit log is lying to you

Server logs capture output. Expacti captures intent — the command before it runs, the human decision, the context. Here's why that distinction matters for SOC 2, incident response, and everything in between.

Read more →

Why your AI agent's audit log is lying to you

Traditional server logs capture output, not intent. AI agents make this worse — they act autonomously, so logs show commands but not the reasoning chain. What SOC 2 CC6.1 actually requires and why your logs fall short.

Read more →

From SSH Bastion to AI Agent Firewall: How Expacti Evolved

The journey from SSH bastion host to AI agent firewall. Why logging what happened is no longer enough when autonomous agents execute 100 commands per second, and what comes next.

Read more →

The hidden cost of autonomous AI: 5 incidents that could have been prevented

Five fictional-but-plausible AI agent failures that illustrate why autonomous systems need human-in-the-loop approval gates. The pattern: small mistakes, large blast radius.

Read more →

Building a Human-in-the-Loop SSH Proxy in Rust

A technical deep-dive into expacti-sshd: PTY-level command interception, bidirectional bridging with tokio::select!, the auth_none trick, and lessons from testing async SSH in Rust.

Read more →

The Anatomy of an AI Agent Gone Wrong

A post-mortem analysis of how AI agents fail in production — trust escalation, blast radius, and three failure scenarios that show why whitelist-based approval is the last line of defense.

Read more →

Human-in-the-Loop Without the Slowdown

The most common objection to approval workflows is latency. Here's how to design them so they move fast — without sacrificing the safety you added them for. Whitelisting, risk-gated timeouts, Slack-native approval, and the psychology of fast review UIs.

Read more →

Shift-left security: why runtime approval beats pre-flight checks

OPA, Kyverno, and policy-as-code tools are excellent for static artifacts. But AI agents generate commands dynamically at runtime — where no static analysis can reach. Here's why runtime approval is the missing layer in your DevSecOps stack.

Read more →

Shift-left security: why runtime approval beats pre-flight checks

OPA, Kyverno, and policy-as-code tools are valuable — but they evaluate declarative configurations, not AI agent decisions made at 2am. The argument for runtime approval as a complementary layer.

Read more →

30 days of approving every production deployment: what we learned

We ran every production deployment through a human approval gate for 30 days. 847 commands reviewed, 91% whitelist hit rate by day 30, 6 incidents prevented, 8.4s average approval time. Here's the honest account.

Read more →

The 10 commands AI agents get wrong (and how to gate them)

From rm -rf with variable paths to eval with dynamic input: the shell commands where AI agents cause the most damage, and the approval-gate patterns that prevent it.

Read more →

How we built the whitelist engine (and what we got wrong the first time)

A technical deep-dive into arc-swap-based rule storage, first-match-wins semantics, TTL expiry, risk scoring across 14 command categories, and three things we'd do differently. With Rust code examples.

Read more →

Least privilege for AI agents: why read-only by default isn't enough

Read-only access sounds safe. But AI agents don't need to write files to cause real damage. Why least privilege for agents requires rethinking what "privilege" actually means.

Read more →

Multi-agent systems and the approval problem

When AI agents spawn other AI agents, the human-in-the-loop disappears. Here's why multi-agent architectures need explicit approval gates at every delegation boundary.

Read more →

When to trust your AI agent (and when not to)

Not every AI action needs human review — but some absolutely do. Here's a practical trust spectrum for deciding when to let agents run freely and when to require explicit approval.

Read more →

The Principal-Agent Problem in AI Systems

Economics solved this problem a century ago with contracts, audits, and constrained authority. AI agents face the same challenge — with higher stakes and fewer guardrails. Here's the lesson we keep forgetting.

Read more →

MCP Tools and the Case for Human Oversight Gates

Model Context Protocol gives AI models direct, structured access to tools that modify your systems. Here's why you need a human approval gate between MCP and production — and how to add one in minutes.

Read more →

Prompt Injection and the Case for Human Approval Gates

Input sanitization and prompt hardening slow attackers down. They don't stop them. Why a human approval gate is the only defense that survives a sophisticated prompt injection attack — and how to build one.

Read more →

GitHub Actions Approval Gates: Human-in-the-Loop for CI/CD Deployments

GitHub's built-in environment protection rules gate the job — not individual commands. Here's how to add per-command human approval to your deployment workflows, with practical YAML examples for migrations, regional rollouts, and on-call routing.

Read more →

Giving AI coding agents production access (without losing sleep)

AI coding agents are good at writing code. They're terrible at knowing when to stop. A practical framework — with code examples in Python, TypeScript, and Go — for safely wiring Claude, Copilot, or Codex to your production systems.

Read more →

Zero-trust for AI agents: trust nothing, verify everything

Zero-trust was designed for human users, but AI agents need it even more. Here's how to apply zero-trust principles to agent infrastructure before your first incident.

Read more →

Limiting blast radius: how to scope what your AI agent can touch

Principle of least privilege is the oldest rule in security. Your AI agent is probably violating it right now. Here's a practical framework for scoping agent permissions across four dimensions — before something goes wrong.

Read more →

The audit trail your security team actually wants

When your AI agent runs commands on production, you need more than server logs. Here's what a real audit trail looks like — and why it matters for SOC 2, ISO 27001, and basic incident response.

Read more →

The cost of blind trust: what actually goes wrong when AI agents act autonomously

We keep giving AI agents more autonomy without seriously thinking through what breaks when they're wrong. Here are the six failure modes nobody talks about — and one principle that prevents all of them.

Read more →

Building AI agents with real human oversight

We shipped SDKs for seven languages this week. Here's how to wire any AI agent to pause and require human approval before executing real-world actions — with code samples in Python, TypeScript, Go, and more.

Read more →

The whitelist fallacy: why context matters as much as the command

Most teams think a static whitelist of approved commands is enough. It isn't. The same command can be safe at 2pm and catastrophic at 2am. Here's why context is the missing layer.

Read more →

Secrets Management for AI Agents: Stop Leaking API Keys

AI agents need credentials to work. They also tend to expose them in logs, prompts, and command histories. Here's how to close the leak surfaces that most teams miss.

Read more →

Permission Creep: How AI Agents End Up With Root

AI agents accumulate permissions the same way humans do — one reasonable exception at a time. Here's how to recognize and reverse permission creep before it becomes a liability.

Read more →

Approval Fatigue: When Human-in-the-Loop Becomes Human-Out-of-the-Loop

Human oversight of AI agents only works if humans actually pay attention. Here's how approval fatigue undermines your safety controls — and how to design around it.

Read more →

Giving AI Agents Database Access Without Losing Sleep

AI agents need database access to do useful work. Here's how to give them what they need without handing over the keys to the kingdom — read replicas, scoped credentials, approval gates, and audit logs.

Read more →

SOC 2 for AI Agents: What Your Auditor Will Actually Ask

Your AI agents run commands in production. Your SOC 2 auditor will ask who approved them and what the audit trail looks like. Here's what they actually want to see — and the evidence you need to produce.

Read more →

Vibe Coding Is Fun Until the AI Drops a Table in Production

AI coding agents can write, run, and deploy code — all in one shot. That's the point. It's also the risk. Here's how to keep vibe coding sessions from turning into production incidents.

Read more →

What Happens Inside an SSH Session Under Human Oversight

From SSH handshake to command execution: a technical deep-dive into how Expacti intercepts, scores, and gates every command in a live terminal session.

Read more →

Why every AI agent command needs explicit human approval

AI coding agents are executing shell commands at machine speed. Here's why "just review the logs afterward" is the wrong mental model — and what to do instead.

Read more →