Loading lesson…
When AI can read documents and act on them, hidden instructions become attacks. Here is what prompt injection is and why nobody has fully solved it.
When a model only reads what you type, it is clear what counts as an instruction. When a model reads a webpage, a PDF, an email, or a tool response, the boundary gets fuzzy. Any text anywhere in its context window might say do this instead, and the model might obey.
In the web era, SQL injection happened because developers concatenated user input into database queries. The database could not tell code from data. Prompt injection has the same shape: the model cannot reliably tell instructions from content.
The difference is worse: SQL injection is fixed by parameterized queries. There is no known full fix for prompt injection. Current defenses reduce it; none eliminate it.
| Attacker goal | Vector | Defense |
|---|---|---|
| Exfiltrate data | Hidden text tells agent to email files | Block outbound email without human approval |
| Manipulate decisions | Biased content in a document | Cross-reference against trusted sources |
| Run harmful tools | Instruction to call delete-all tool | Require confirmation for destructive actions |
| Phish the user | Fake authority instruction in page | Warn user, never auto-click injected links |
Every piece of data an agent reads is a potential prompt. Design like every document is a letter from an adversary.
— Simon Willison, independent researcher
The big idea: the more useful AI agents get, the more they read from the world, and the more they read, the more attack surface they have. Mitigations exist; a perfect fix does not.
8 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-safety-prompt-injection-builders
What is the main idea of "Prompt Injection: The Agent Era's SQL Injection"?
Which concept is most central to "Prompt Injection: The Agent Era's SQL Injection"?
Which use of AI fits this topic best?
What should a careful learner remember about "A real 2023 example"?
You want to use AI after this lesson. What is the safest next step?
How should AI output about prompt injection be treated?
Name one way to verify an AI answer about prompt injection.
Which action would help you apply "Prompt Injection: The Agent Era's SQL Injection" responsibly?