Tendril

Lesson 43 of 1570

Agent Safety: Sandboxes and Human-in-the-Loop

Giving an AI the keys to your computer is a big deal. Learn the two simplest ways to keep an agent safe: wall it off from things it shouldn't touch, and put a human in the decision path.

BuildersAgentic AI~20 min readIntermediateBI2 · Representation & ReasoningBI4 · Natural InteractionPrint / PDF

Lesson map

What this lesson covers

34 min18 blocks4 concepts

Learning path

The main moves in order

1Why safety is not optional
2sandboxing
3human-in-the-loop
4least privilege

Concept cluster

Terms to connect while reading

sandboxinghuman-in-the-loopleast privilegeapproval gates

Sections4

Lists1

Notes5

Code2

Compare1

Section 1

Why safety is not optional

An agent with filesystem access can delete your thesis. An agent with email access can send 'you're fired' to your entire team. An agent with credit card access can order 400 rubber ducks. These are not hypothetical — every one has happened in production in the last 18 months. Safety isn't a feature, it's infrastructure.

Defense one: sandbox

A sandbox is a walled space where the agent can only see and touch specific things. Outside the wall, the agent has no power. Modern options in 2026:

Check-in 1. Got it so far?

Vercel Sandbox — Firecracker microVMs for running untrusted code.
Docker containers with bind-mounted folders only (classic and free).
Anthropic's Claude Code with scoped permissions (per-directory, per-tool).
Browser-only agents (Browser Use, Operator) — can't touch your filesystem.
Virtual machines (VirtualBox, UTM, Parallels) — full isolation if you're paranoid.

An example permission config. Deny by default. Allow exactly what's needed.

json

{
  "agent": "cleanup-bot",
  "sandbox": {
    "filesystem": {
      "read": ["~/Downloads", "~/Desktop"],
      "write": ["~/Downloads/archive"],
      "deny": ["~/Documents", "~/Library", "~/.ssh"]
    },
    "network": "none",
    "shell": false
  }
}

Defense two: human-in-the-loop

For any action that's destructive (delete, send, pay, push), require a human to approve before the agent proceeds. Yes, it slows things down. Yes, it's worth it. Every serious platform — Claude Code, Devin, OpenClaw Mission Control — has an approval gate feature.

Check-in 2. Got it so far?

What an approval gate should look like: action, reason, impact, pause.

text

AGENT: I want to run: rm -rf ~/old_project
       Reason: You asked me to clean up old projects.
       Impact: Deletes 1.2 GB, 847 files.
       
AWAITING APPROVAL (y/n/details):

Tiered approval

Compare the options

Risk	Approval policy	Examples
Low (read-only)	Auto-approve.	List files, read a page, query an API.
Medium (reversible)	Batch-approve with review.	Create a branch, draft an email.
High (destructive)	Per-action human approval.	Delete files, send emails, make payments.
Critical (irreversible)	Two-person approval + logging.	Deploy prod, wire money, delete users.

Check-in 3. Got it so far?

Good agent design is mostly about building the safe envelope, not the clever prompt. Get the envelope right and you can run bold experiments. Get it wrong and one bad run can ruin a week.

Key terms in this lesson

Check-in 4. Got it so far?

End-of-lesson quiz

Check what stuck

15 questions · Score saves to your progress.

Tutor

Curious about “Agent Safety: Sandboxes and Human-in-the-Loop”?

Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.

Progress saved locally in this browser. Sign in to sync across devices.

Related lessons

Agent Safety: Sandboxes and Human-in-the-Loop

Why safety is not optional

Defense one: sandbox

Defense two: human-in-the-loop

Tiered approval

Curious about “Agent Safety: Sandboxes and Human-in-the-Loop”?

Keep going

Agent Safety: Sandboxes and Human-in-the-Loop

Why safety is not optional

Defense one: sandbox

Defense two: human-in-the-loop

Tiered approval

Curious about “Agent Safety: Sandboxes and Human-in-the-Loop”?

Keep going