Loading lesson…
Giving an AI the keys to your computer is a big deal. Learn the two simplest ways to keep an agent safe: wall it off from things it shouldn't touch, and put a human in the decision path.
An agent with filesystem access can delete your thesis. An agent with email access can send 'you're fired' to your entire team. An agent with credit card access can order 400 rubber ducks. These are not hypothetical — every one has happened in production in the last 18 months. Safety isn't a feature, it's infrastructure.
A sandbox is a walled space where the agent can only see and touch specific things. Outside the wall, the agent has no power. Modern options in 2026:
{
"agent": "cleanup-bot",
"sandbox": {
"filesystem": {
"read": ["~/Downloads", "~/Desktop"],
"write": ["~/Downloads/archive"],
"deny": ["~/Documents", "~/Library", "~/.ssh"]
},
"network": "none",
"shell": false
}
}An example permission config. Deny by default. Allow exactly what's needed.For any action that's destructive (delete, send, pay, push), require a human to approve before the agent proceeds. Yes, it slows things down. Yes, it's worth it. Every serious platform — Claude Code, Devin, OpenClaw Mission Control — has an approval gate feature.
AGENT: I want to run: rm -rf ~/old_project
Reason: You asked me to clean up old projects.
Impact: Deletes 1.2 GB, 847 files.
AWAITING APPROVAL (y/n/details):What an approval gate should look like: action, reason, impact, pause.| Risk | Approval policy | Examples |
|---|---|---|
| Low (read-only) | Auto-approve. | List files, read a page, query an API. |
| Medium (reversible) | Batch-approve with review. | Create a branch, draft an email. |
| High (destructive) | Per-action human approval. | Delete files, send emails, make payments. |
| Critical (irreversible) | Two-person approval + logging. | Deploy prod, wire money, delete users. |
Good agent design is mostly about building the safe envelope, not the clever prompt. Get the envelope right and you can run bold experiments. Get it wrong and one bad run can ruin a week.
15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-agentic-safety-sandbox-builders
What is the core idea behind "Agent Safety: Sandboxes and Human-in-the-Loop"?
Which term best describes a foundational idea in "Agent Safety: Sandboxes and Human-in-the-Loop"?
A learner studying Agent Safety: Sandboxes and Human-in-the-Loop would need to understand which concept?
Which of these is directly relevant to Agent Safety: Sandboxes and Human-in-the-Loop?
Which of the following is a key point about Agent Safety: Sandboxes and Human-in-the-Loop?
Which of these does NOT belong in a discussion of Agent Safety: Sandboxes and Human-in-the-Loop?
What is the key insight about "Assume the agent will mess up" in the context of Agent Safety: Sandboxes and Human-in-the-Loop?
What is the key insight about "The rule of three" in the context of Agent Safety: Sandboxes and Human-in-the-Loop?
What is the key insight about "Never give agents your master password" in the context of Agent Safety: Sandboxes and Human-in-the-Loop?
Which statement accurately describes an aspect of Agent Safety: Sandboxes and Human-in-the-Loop?
What does working with Agent Safety: Sandboxes and Human-in-the-Loop typically involve?
Which of the following is true about Agent Safety: Sandboxes and Human-in-the-Loop?
Which best describes the scope of "Agent Safety: Sandboxes and Human-in-the-Loop"?
Which section heading best belongs in a lesson about Agent Safety: Sandboxes and Human-in-the-Loop?
Which section heading best belongs in a lesson about Agent Safety: Sandboxes and Human-in-the-Loop?