Prompt Injection: The Agent Era's SQL Injection

When AI can read documents and act on them, hidden instructions become attacks. Here is what prompt injection is and why nobody has fully solved it.

30 min · Reviewed 2026

The Trust Boundary Problem

When a model only reads what you type, it is clear what counts as an instruction. When a model reads a webpage, a PDF, an email, or a tool response, the boundary gets fuzzy. Any text anywhere in its context window might say do this instead, and the model might obey.

Two flavors

Direct injection: the user types the attack themselves (often to bypass safety)
Indirect injection: someone else planted the attack in content the model reads

Why this is the agent era's SQL injection

In the web era, SQL injection happened because developers concatenated user input into database queries. The database could not tell code from data. Prompt injection has the same shape: the model cannot reliably tell instructions from content.

The difference is worse: SQL injection is fixed by parameterized queries. There is no known full fix for prompt injection. Current defenses reduce it; none eliminate it.

Defenses that help

Instruction isolation: system prompt in a privileged channel the model is trained to trust more
Content labeling: mark document text as data, not instructions
Capability gating: never let the model send email without a human click
Sandboxing: agents run in constrained environments with no network
Output filtering: scan results for suspicious patterns before acting
User confirmation: high-stakes actions require explicit approval

Attacker goal	Vector	Defense
Exfiltrate data	Hidden text tells agent to email files	Block outbound email without human approval
Manipulate decisions	Biased content in a document	Cross-reference against trusted sources
Run harmful tools	Instruction to call delete-all tool	Require confirmation for destructive actions
Phish the user	Fake authority instruction in page	Warn user, never auto-click injected links

Every piece of data an agent reads is a potential prompt. Design like every document is a letter from an adversary.
— Simon Willison, independent researcher

The big idea: the more useful AI agents get, the more they read from the world, and the more they read, the more attack surface they have. Mitigations exist; a perfect fix does not.

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-safety-prompt-injection-builders

Why do security experts call prompt injection 'the agent era's SQL injection'?
1. Both only affect older technology systems
2. Both happen when AI cannot distinguish instructions from data, just like databases couldn't distinguish code from user input
3. Both require writing actual computer code to exploit
4. Both were solved by the same researchers
What makes the 'trust boundary problem' especially difficult for AI agents?
1. Agents only read from trusted websites
2. Trust boundaries only matter for paid AI tools
3. Agents read many sources of information, making it unclear which text should be treated as instructions
4. The problem only occurs when users type commands
A researcher hidden text on a webpage that made Bing Chat respond in pirate English and insert a phishing link. What type of attack was this?
1. Direct injection
2. SQL injection
3. Social engineering
4. Indirect injection
Which defense involves training an AI to treat certain instructions as more trustworthy than others?
1. Output filtering
2. Capability gating
3. Instruction isolation
4. Sandboxing
Why is there no complete fix for prompt injection like there is for SQL injection?
1. Governments have banned solutions
2. The problem only affects expensive AI systems
3. AI models fundamentally cannot reliably distinguish instructions from content
4. Too few researchers work on the problem
What does 'capability gating' mean as a defense against prompt injection?
1. Training the AI to recognize more capabilities
2. Preventing the AI from taking certain actions without human approval
3. Allowing the AI to access more tools
4. Making the AI faster at processing requests
What is the 'big idea' about AI agents and prompt injection risk?
1. Agents can never be secured
2. Prompt injection only affects research AI
3. Agents are too dangerous to use
4. The more useful agents become, the more they read, and the more they read, the more attack surface they have
What does 'sandboxing' do as a defense against prompt injection?
1. Trains the AI to recognize sandboxed content
2. Creates a backup of all AI interactions
3. Allows the AI to run faster
4. Runs agents in constrained environments with no network access
An attacker hides instructions in a document telling an AI to email sensitive files to a stolen email address. What defense would help most against this?
1. Blocking outbound email without human approval
2. Training the AI on more data
3. Scanning documents for viruses
4. Using a faster AI model
What is 'content labeling' as a defense?
1. Training AI to ignore all instructions
2. Creating new AI languages
3. Giving AI more content to read
4. Marking document text as data, not instructions
A webpage contains fake authority instructions telling an AI to present malicious links to users. What should users be warned about?
1. Users need to become programmers
2. Users should ignore all AI responses
3. The AI should delete the webpage
4. The AI should warn users and never auto-click injected links
Why is it risky to let an AI agent take irreversible actions on content you haven't read?
1. Humans can't understand AI
2. The content might contain hidden instructions that could trigger harmful actions
3. Reading takes too much time
4. AI always makes perfect decisions
The quote 'Every piece of data an agent reads is a potential prompt' means:
1. Any text an AI processes could contain hidden instructions it might follow
2. Data should never be read by AI
3. Prompts are more dangerous than data
4. AI can only read certain types of data
A biased document might trick an AI into making unfair decisions. What defense helps against this?
1. Ignoring all input
2. Only using paid AI tools
3. Cross-referencing against trusted sources
4. Deleting all documents
What makes prompt injection worse than SQL injection in terms of solutions?
1. Prompt injection only happens accidentally
2. SQL injection was never really fixed
3. Prompt injection only affects government systems
4. SQL injection has a clear technical fix, but no equivalent fix exists for prompt injection

← Back to interactive lesson

Tendril · Builders · Ethics & Society