Lesson 257 of 1570
Prompt Injection: The Agent Era's SQL Injection
When AI can read documents and act on them, hidden instructions become attacks. Here is what prompt injection is and why nobody has fully solved it.
Lesson map
What this lesson covers
Learning path
The main moves in order
- 1The Trust Boundary Problem
- 2prompt injection
- 3indirect injection
- 4agent security
Concept cluster
Terms to connect while reading
Section 1
The Trust Boundary Problem
When a model only reads what you type, it is clear what counts as an instruction. When a model reads a webpage, a PDF, an email, or a tool response, the boundary gets fuzzy. Any text anywhere in its context window might say do this instead, and the model might obey.
Two flavors
- Direct injection: the user types the attack themselves (often to bypass safety)
- Indirect injection: someone else planted the attack in content the model reads
Why this is the agent era's SQL injection
In the web era, SQL injection happened because developers concatenated user input into database queries. The database could not tell code from data. Prompt injection has the same shape: the model cannot reliably tell instructions from content.
The difference is worse: SQL injection is fixed by parameterized queries. There is no known full fix for prompt injection. Current defenses reduce it; none eliminate it.
Defenses that help
- 1Instruction isolation: system prompt in a privileged channel the model is trained to trust more
- 2Content labeling: mark document text as data, not instructions
- 3Capability gating: never let the model send email without a human click
- 4Sandboxing: agents run in constrained environments with no network
- 5Output filtering: scan results for suspicious patterns before acting
- 6User confirmation: high-stakes actions require explicit approval
Compare the options
| Attacker goal | Vector | Defense |
|---|---|---|
| Exfiltrate data | Hidden text tells agent to email files | Block outbound email without human approval |
| Manipulate decisions | Biased content in a document | Cross-reference against trusted sources |
| Run harmful tools | Instruction to call delete-all tool | Require confirmation for destructive actions |
| Phish the user | Fake authority instruction in page | Warn user, never auto-click injected links |
“Every piece of data an agent reads is a potential prompt. Design like every document is a letter from an adversary.”
Key terms in this lesson
The big idea: the more useful AI agents get, the more they read from the world, and the more they read, the more attack surface they have. Mitigations exist; a perfect fix does not.
End-of-lesson quiz
Check what stuck
15 questions · Score saves to your progress.
Tutor
Curious about “Prompt Injection: The Agent Era's SQL Injection”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Builders · 25 min
The Environmental Cost of Training a Big Model
Training a frontier model uses the electricity of a small city for months. Running inference at scale matches a large country's load. Here is what the numbers actually look like.
Builders · 25 min
Japan's Soft-Law AI Framework
Japan chose light-touch, guideline-based AI governance built on existing laws. Understanding why illuminates a real alternative to comprehensive AI acts.
Builders · 25 min
Red-Teaming: People Paid to Break AI
Red-teamers try to make models misbehave before bad actors do. Here is how the job works, who does it, and what they look for.
