Tendril

Lesson 257 of 1570

Prompt Injection: The Agent Era's SQL Injection

When AI can read documents and act on them, hidden instructions become attacks. Here is what prompt injection is and why nobody has fully solved it.

BuildersEthics & Society~18 min readIntermediateProfessionalBI5 · Societal ImpactBI3 · LearningPrint / PDF

Lesson map

What this lesson covers

30 min17 blocks3 concepts

Learning path

The main moves in order

1The Trust Boundary Problem
2prompt injection
3indirect injection
4agent security

Concept cluster

Terms to connect while reading

prompt injectionindirect injectionagent security

Sections4

Lists2

Notes4

Compare1

Quotes1

Section 1

The Trust Boundary Problem

When a model only reads what you type, it is clear what counts as an instruction. When a model reads a webpage, a PDF, an email, or a tool response, the boundary gets fuzzy. Any text anywhere in its context window might say do this instead, and the model might obey.

Two flavors

Direct injection: the user types the attack themselves (often to bypass safety)
Indirect injection: someone else planted the attack in content the model reads

Check-in 1. Got it so far?

Why this is the agent era's SQL injection

In the web era, SQL injection happened because developers concatenated user input into database queries. The database could not tell code from data. Prompt injection has the same shape: the model cannot reliably tell instructions from content.

The difference is worse: SQL injection is fixed by parameterized queries. There is no known full fix for prompt injection. Current defenses reduce it; none eliminate it.

Defenses that help

1Instruction isolation: system prompt in a privileged channel the model is trained to trust more
2Content labeling: mark document text as data, not instructions
3Capability gating: never let the model send email without a human click
4Sandboxing: agents run in constrained environments with no network
5Output filtering: scan results for suspicious patterns before acting
6User confirmation: high-stakes actions require explicit approval

Check-in 2. Got it so far?

Compare the options

Attacker goal	Vector	Defense
Exfiltrate data	Hidden text tells agent to email files	Block outbound email without human approval
Manipulate decisions	Biased content in a document	Cross-reference against trusted sources
Run harmful tools	Instruction to call delete-all tool	Require confirmation for destructive actions
Phish the user	Fake authority instruction in page	Warn user, never auto-click injected links

“Every piece of data an agent reads is a potential prompt. Design like every document is a letter from an adversary.”
Simon Willison, independent researcher

Check-in 3. Got it so far?

Key terms in this lesson

The big idea: the more useful AI agents get, the more they read from the world, and the more they read, the more attack surface they have. Mitigations exist; a perfect fix does not.

Check-in 4. Got it so far?

End-of-lesson quiz

Check what stuck

15 questions · Score saves to your progress.

Tutor

Curious about “Prompt Injection: The Agent Era's SQL Injection”?

Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.

Progress saved locally in this browser. Sign in to sync across devices.

Related lessons

Prompt Injection: The Agent Era's SQL Injection

The Trust Boundary Problem

Two flavors

Why this is the agent era's SQL injection

Defenses that help

Curious about “Prompt Injection: The Agent Era's SQL Injection”?

Keep going

Prompt Injection: The Agent Era's SQL Injection

The Trust Boundary Problem

Two flavors

Why this is the agent era's SQL injection

Defenses that help

Curious about “Prompt Injection: The Agent Era's SQL Injection”?

Keep going