Loading lesson…
Giving an AI the keys to your computer is a big deal. Learn the two simplest ways to keep an agent safe: wall it off from things it shouldn't touch, and put a human in the decision path.
An agent with filesystem access can delete your thesis. An agent with email access can send 'you're fired' to your entire team. An agent with credit card access can order 400 rubber ducks. These are not hypothetical — every one has happened in production in the last 18 months. Safety isn't a feature, it's infrastructure.
A sandbox is a walled space where the agent can only see and touch specific things. Outside the wall, the agent has no power. Modern options in 2026:
{ "agent": "cleanup-bot", "sandbox": { "filesystem": { "read": ["~/Downloads", "~/Desktop"], "write": ["~/Downloads/archive"], "deny": ["~/Documents", "~/Library", "~/.ssh"] }, "network": "none", "shell": false } }An example permission config. Deny by default. Allow exactly what's needed.For any action that's destructive (delete, send, pay, push), require a human to approve before the agent proceeds. Yes, it slows things down. Yes, it's worth it. Every serious platform — Claude Code, Devin, OpenClaw Mission Control — has an approval gate feature.
AGENT: I want to run: rm -rf ~/old_project Reason: You asked me to clean up old projects. Impact: Deletes 1.2 GB, 847 files. AWAITING APPROVAL (y/n/details):What an approval gate should look like: action, reason, impact, pause.| Risk | Approval policy | Examples |
|---|---|---|
| Low (read-only) | Auto-approve. | List files, read a page, query an API. |
| Medium (reversible) | Batch-approve with review. | Create a branch, draft an email. |
| High (destructive) | Per-action human approval. | Delete files, send emails, make payments. |
| Critical (irreversible) | Two-person approval + logging. | Deploy prod, wire money, delete users. |
Good agent design is mostly about building the safe envelope, not the clever prompt. Get the envelope right and you can run bold experiments. Get it wrong and one bad run can ruin a week.
8 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-agentic-safety-sandbox-builders
What is the main idea of "Agent Safety: Sandboxes and Human-in-the-Loop"?
Which concept is most central to "Agent Safety: Sandboxes and Human-in-the-Loop"?
Which use of AI fits this topic best?
What should a careful learner remember about "Assume the agent will mess up"?
You want to use AI after this lesson. What is the safest next step?
How should AI output about sandboxing be treated?
Name one way to verify an AI answer about sandboxing.
Which action would help you apply "Agent Safety: Sandboxes and Human-in-the-Loop" responsibly?