Lesson 43 of 1570
Agent Safety: Sandboxes and Human-in-the-Loop
Giving an AI the keys to your computer is a big deal. Learn the two simplest ways to keep an agent safe: wall it off from things it shouldn't touch, and put a human in the decision path.
Lesson map
What this lesson covers
Learning path
The main moves in order
- 1Why safety is not optional
- 2sandboxing
- 3human-in-the-loop
- 4least privilege
Concept cluster
Terms to connect while reading
Section 1
Why safety is not optional
An agent with filesystem access can delete your thesis. An agent with email access can send 'you're fired' to your entire team. An agent with credit card access can order 400 rubber ducks. These are not hypothetical — every one has happened in production in the last 18 months. Safety isn't a feature, it's infrastructure.
Defense one: sandbox
A sandbox is a walled space where the agent can only see and touch specific things. Outside the wall, the agent has no power. Modern options in 2026:
- Vercel Sandbox — Firecracker microVMs for running untrusted code.
- Docker containers with bind-mounted folders only (classic and free).
- Anthropic's Claude Code with scoped permissions (per-directory, per-tool).
- Browser-only agents (Browser Use, Operator) — can't touch your filesystem.
- Virtual machines (VirtualBox, UTM, Parallels) — full isolation if you're paranoid.
An example permission config. Deny by default. Allow exactly what's needed.
{
"agent": "cleanup-bot",
"sandbox": {
"filesystem": {
"read": ["~/Downloads", "~/Desktop"],
"write": ["~/Downloads/archive"],
"deny": ["~/Documents", "~/Library", "~/.ssh"]
},
"network": "none",
"shell": false
}
}Defense two: human-in-the-loop
For any action that's destructive (delete, send, pay, push), require a human to approve before the agent proceeds. Yes, it slows things down. Yes, it's worth it. Every serious platform — Claude Code, Devin, OpenClaw Mission Control — has an approval gate feature.
What an approval gate should look like: action, reason, impact, pause.
AGENT: I want to run: rm -rf ~/old_project
Reason: You asked me to clean up old projects.
Impact: Deletes 1.2 GB, 847 files.
AWAITING APPROVAL (y/n/details):Tiered approval
Compare the options
| Risk | Approval policy | Examples |
|---|---|---|
| Low (read-only) | Auto-approve. | List files, read a page, query an API. |
| Medium (reversible) | Batch-approve with review. | Create a branch, draft an email. |
| High (destructive) | Per-action human approval. | Delete files, send emails, make payments. |
| Critical (irreversible) | Two-person approval + logging. | Deploy prod, wire money, delete users. |
Good agent design is mostly about building the safe envelope, not the clever prompt. Get the envelope right and you can run bold experiments. Get it wrong and one bad run can ruin a week.
Key terms in this lesson
End-of-lesson quiz
Check what stuck
15 questions · Score saves to your progress.
Tutor
Curious about “Agent Safety: Sandboxes and Human-in-the-Loop”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Builders · 28 min
Chat AI vs. Agent AI: The Real Difference
A chatbot answers. An agent does. Learn the line between a model that talks and a model that acts — and why crossing it changes everything about how you work with AI.
Builders · 30 min
Why Agents Fail (and How to Notice)
Agents fail in weird, quiet, expensive ways. Learn the six failure modes, the warning signs, and the simple habits that catch problems before they compound.
Builders · 30 min
Cloud Agents vs. Local Agents: The Privacy Tradeoff
Your data can live in someone's data center or on your own laptop. Both are real options in 2026. Understand what you gain and lose with each.
