2026-03-27 Case Study DevOps CI/CD

30 days of approving every production deployment: what we learned

We decided to run every production deployment through a human approval gate for an entire month. No exceptions — every docker compose pull, every migration, every config change. Here's what caught us off-guard, what became routine faster than expected, and what we'd never go back on.

The setup

We have a reasonably typical stack: a few microservices, a PostgreSQL database, Caddy as a reverse proxy, some background workers. We deploy via GitHub Actions, typically 3–8 times per day across the team. Nothing exotic.

The change: we added expacti's expacti-action to every deploy step. Any command that touched production — docker compose up -d, database migrations, service restarts — now paused for approval before executing. We got Slack notifications; we had 60 seconds to approve or the command would be auto-denied.

# Before
- name: Deploy backend
  run: docker compose pull backend && docker compose up -d --no-deps backend

# After
- uses: expacti/expacti-action@v1
  with:
    command: "docker compose pull backend && docker compose up -d --no-deps backend"
    backend_url: ${{ secrets.EXPACTI_URL }}
    shell_token: ${{ secrets.EXPACTI_TOKEN }}
    timeout: 60

We ran this for 30 days, logged every approval and denial, and wrote down observations as they happened.

The numbers

847

Commands reviewed

91%

Whitelist hit rate (by day 30)

8.4s

Avg approval time

Manual denials issued

Incidents prevented

Production incidents caused by deploys

That whitelist hit rate is the number that surprised us most. By day 30, 91% of commands executed without any human intervention — they matched a whitelist rule built up over the first two weeks. The approval overhead essentially disappeared for routine work.

The first week: chaos and learning

Week one was rough. The whitelist was empty. Every single command hit the approval queue. Slack was a wall of notifications. The median approval time was 34 seconds because we were still reading the commands carefully.

Day 1

47 commands reviewed. Approving felt tedious. We almost turned it off. The whitelist had 12 entries by end of day.

Day 3

First interesting denial: a developer was deploying a hotfix and accidentally included a docker exec -it api bash step left from debugging. The command paused; we denied; they removed it. Would have opened an interactive shell on prod that nobody needed.

Day 5

Whitelist had 78 entries. Routine deploys started getting through automatically. Approval queue traffic dropped 60% compared to day 1.

Day 7

Second meaningful catch: a migration script that included ALTER TABLE users DROP COLUMN legacy_data. The risk score was 85/100 (CRITICAL). We paused, checked the code, confirmed the column was safe to drop, approved. Nothing broke — but we were glad we looked.

The second week: the whitelist becomes your policy

By week two, something shifted. The whitelist wasn't just a cache of approved commands — it had become a written record of what our deploy pipeline was allowed to do.

When a new developer joined, we could point them at the whitelist and say: "here's exactly what happens on every production deploy." Not a README that might be out of date. The actual, live policy.

We started treating deny decisions as policy decisions. If we denied a command, we added a comment explaining why. Those comments accumulated into a kind of living runbook.

Unexpected benefit

The whitelist became the most accurate documentation of our deploy process. It's impossible for it to be wrong — it's the actual process, recorded as each step was approved.

The six incidents we prevented

"Incidents prevented" is inherently speculative — we can't know what would have happened if we'd let the commands through. But six times during the 30 days, someone denied a command that turned out to be wrong:

The debugging shell (Day 3): docker exec -it left in a deploy script accidentally.
The wrong target (Day 9): A migration was pointed at production, not staging. Risk score 72 (HIGH) triggered careful review. Caught before running.
The force push (Day 14): git push origin main --force appeared in a GitHub Actions step. Risk score 45 (MEDIUM) — the review made us look twice. The force push was unnecessary; the dev was cleaning up after a rebase that had already merged.
The log rotation (Day 18): A cleanup script included rm -rf /var/log/app-*. Approved after review — but the review made us realize we should archive those logs, not delete them. Refactored before running.
The off-hours schema change (Day 22): Someone kicked off a migration at 11pm on a Friday (anomaly detection fired on both the time and the high-risk score). Denied. Ran it properly Monday morning.
The double deploy (Day 28): A CI job got triggered twice. The second deploy was identical to the first, which had just completed. Denied — not harmful, but also unnecessary.

None of these were catastrophic. But incidents 2 and 3 in particular could have caused real problems if they'd run unreviewed. The cost of catching them: a few seconds of a developer's attention.

What was genuinely annoying

Honest account: three things caused real friction.

1. Late-night deploys

Anyone doing work after 10pm had to either wait for someone else to be online or approve their own commands (which defeats the purpose for critical operations). We ended up setting up a Slack channel specifically for late-night approvals. Not ideal.

The fix: for our team, this meant being more deliberate about when we deployed. We stopped doing production changes outside business hours unless genuinely necessary. Side effect: fewer off-hours incidents by attrition.

2. Long-running commands

The 60-second timeout was too short for some commands (database migrations on large tables, image builds). We bumped it to 300 seconds for known slow operations.

3. The whitelist maintenance burden

After 30 days we had 340 whitelist rules. Some of them were redundant (an exact match AND a glob for the same command family). The AI suggestions helped, but we still needed a cleanup session at the end of the month. Build up the whitelist intentionally, and prune it periodically.

What surprised us

The biggest surprise wasn't technical — it was cultural.

The act of approving made people read commands. Before, deploy scripts were committed, CI ran them, and unless something broke nobody looked closely. Now, every command had a human who consciously said "yes, this is fine." That changed how people felt about deploy scripts — they became documents, not just automation artifacts.

Within two weeks, we started seeing developers write clearer commands in their deploy scripts. Not because we told them to — but because they knew a human would be reading each one. The approval gate created an implicit code review for CI/CD configuration.

Worth noting

The 8.4 second average approval time is ours, with our team size and Slack notification setup. Your mileage will vary significantly based on how responsive your reviewers are. Budget for this when evaluating adoption.

Month two: what changed

We're still running it. Some changes after 30 days:

Expanded coverage to include infrastructure changes (Terraform plans now route through the approval queue)
Added a second reviewer requirement for database schema changes
Cleaned up the whitelist from 340 to 180 rules using the AI suggestions export
Onboarded our staging environment — turns out staging deploys were also worth reviewing once we could see them in the audit log

The question we get most often: "doesn't it slow you down?" The answer is: not meaningfully. The whitelist handles routine work automatically. Reviews only happen for the things that actually need a human. For us, that's roughly once or twice a day — and it's exactly the once or twice a day we'd want to be paying attention anyway.

Try it on your next deploy

One GitHub Actions step is all it takes. No config, no infrastructure changes.

▶ See the demo DevOps use case