30 days of approving every production deployment: what we learned
We decided to run every production deployment through a human approval gate for an entire month. No exceptions — every docker compose pull, every migration, every config change. Here's what caught us off-guard, what became routine faster than expected, and what we'd never go back on.
The setup
We have a reasonably typical stack: a few microservices, a PostgreSQL database, Caddy as a reverse proxy, some background workers. We deploy via GitHub Actions, typically 3–8 times per day across the team. Nothing exotic.
The change: we added expacti's expacti-action to every deploy step. Any command that touched production — docker compose up -d, database migrations, service restarts — now paused for approval before executing. We got Slack notifications; we had 60 seconds to approve or the command would be auto-denied.
# Before
- name: Deploy backend
run: docker compose pull backend && docker compose up -d --no-deps backend
# After
- uses: expacti/expacti-action@v1
with:
command: "docker compose pull backend && docker compose up -d --no-deps backend"
backend_url: ${{ secrets.EXPACTI_URL }}
shell_token: ${{ secrets.EXPACTI_TOKEN }}
timeout: 60
We ran this for 30 days, logged every approval and denial, and wrote down observations as they happened.
The numbers
That whitelist hit rate is the number that surprised us most. By day 30, 91% of commands executed without any human intervention — they matched a whitelist rule built up over the first two weeks. The approval overhead essentially disappeared for routine work.
The first week: chaos and learning
Week one was rough. The whitelist was empty. Every single command hit the approval queue. Slack was a wall of notifications. The median approval time was 34 seconds because we were still reading the commands carefully.
docker exec -it api bash step left from debugging. The command paused; we denied; they removed it. Would have opened an interactive shell on prod that nobody needed.ALTER TABLE users DROP COLUMN legacy_data. The risk score was 85/100 (CRITICAL). We paused, checked the code, confirmed the column was safe to drop, approved. Nothing broke — but we were glad we looked.The second week: the whitelist becomes your policy
By week two, something shifted. The whitelist wasn't just a cache of approved commands — it had become a written record of what our deploy pipeline was allowed to do.
When a new developer joined, we could point them at the whitelist and say: "here's exactly what happens on every production deploy." Not a README that might be out of date. The actual, live policy.
We started treating deny decisions as policy decisions. If we denied a command, we added a comment explaining why. Those comments accumulated into a kind of living runbook.
The whitelist became the most accurate documentation of our deploy process. It's impossible for it to be wrong — it's the actual process, recorded as each step was approved.
The six incidents we prevented
"Incidents prevented" is inherently speculative — we can't know what would have happened if we'd let the commands through. But six times during the 30 days, someone denied a command that turned out to be wrong:
- The debugging shell (Day 3):
docker exec -itleft in a deploy script accidentally. - The wrong target (Day 9): A migration was pointed at production, not staging. Risk score 72 (HIGH) triggered careful review. Caught before running.
- The force push (Day 14):
git push origin main --forceappeared in a GitHub Actions step. Risk score 45 (MEDIUM) — the review made us look twice. The force push was unnecessary; the dev was cleaning up after a rebase that had already merged. - The log rotation (Day 18): A cleanup script included
rm -rf /var/log/app-*. Approved after review — but the review made us realize we should archive those logs, not delete them. Refactored before running. - The off-hours schema change (Day 22): Someone kicked off a migration at 11pm on a Friday (anomaly detection fired on both the time and the high-risk score). Denied. Ran it properly Monday morning.
- The double deploy (Day 28): A CI job got triggered twice. The second deploy was identical to the first, which had just completed. Denied — not harmful, but also unnecessary.
None of these were catastrophic. But incidents 2 and 3 in particular could have caused real problems if they'd run unreviewed. The cost of catching them: a few seconds of a developer's attention.
What was genuinely annoying
Honest account: three things caused real friction.
1. Late-night deploys
Anyone doing work after 10pm had to either wait for someone else to be online or approve their own commands (which defeats the purpose for critical operations). We ended up setting up a Slack channel specifically for late-night approvals. Not ideal.
The fix: for our team, this meant being more deliberate about when we deployed. We stopped doing production changes outside business hours unless genuinely necessary. Side effect: fewer off-hours incidents by attrition.
2. Long-running commands
The 60-second timeout was too short for some commands (database migrations on large tables, image builds). We bumped it to 300 seconds for known slow operations.
3. The whitelist maintenance burden
After 30 days we had 340 whitelist rules. Some of them were redundant (an exact match AND a glob for the same command family). The AI suggestions helped, but we still needed a cleanup session at the end of the month. Build up the whitelist intentionally, and prune it periodically.
What surprised us
The biggest surprise wasn't technical — it was cultural.
The act of approving made people read commands. Before, deploy scripts were committed, CI ran them, and unless something broke nobody looked closely. Now, every command had a human who consciously said "yes, this is fine." That changed how people felt about deploy scripts — they became documents, not just automation artifacts.
Within two weeks, we started seeing developers write clearer commands in their deploy scripts. Not because we told them to — but because they knew a human would be reading each one. The approval gate created an implicit code review for CI/CD configuration.
The 8.4 second average approval time is ours, with our team size and Slack notification setup. Your mileage will vary significantly based on how responsive your reviewers are. Budget for this when evaluating adoption.
Month two: what changed
We're still running it. Some changes after 30 days:
- Expanded coverage to include infrastructure changes (Terraform plans now route through the approval queue)
- Added a second reviewer requirement for database schema changes
- Cleaned up the whitelist from 340 to 180 rules using the AI suggestions export
- Onboarded our staging environment — turns out staging deploys were also worth reviewing once we could see them in the audit log
The question we get most often: "doesn't it slow you down?" The answer is: not meaningfully. The whitelist handles routine work automatically. Reviews only happen for the things that actually need a human. For us, that's roughly once or twice a day — and it's exactly the once or twice a day we'd want to be paying attention anyway.
Try it on your next deploy
One GitHub Actions step is all it takes. No config, no infrastructure changes.
▶ See the demo DevOps use case