Lesson 304 of 2116
Opt-Out Mechanisms: The Real State of Consent
Many AI companies now offer opt-outs from training. But how well do they actually work, and what are the catches?
Lesson map
What this lesson covers
Learning path
The main moves in order
- 1Opt-Out Is Not Opt-In
- 2opt-out
- 3consent
- 4training exclusion
Concept cluster
Terms to connect while reading
Section 1
Opt-Out Is Not Opt-In
By default, most scraped web content is fair game for AI training unless you actively opt out. This is the opposite of GDPR's consent model. The ethics are debated, but the technical reality means individuals must proactively block usage rather than grant it.
The main opt-out channels
Compare the options
| Channel | Blocks what | Effectiveness |
|---|---|---|
| robots.txt + GPTBot | OpenAI's crawler | Works for OpenAI |
| robots.txt + ClaudeBot | Anthropic's crawler | Works for Anthropic |
| ai.txt (proposed) | AI training specifically | Voluntary, patchy adoption |
| DoNotTrain meta tag | Sites that honor it | Limited |
| OpenAI individual opt-out form | Future training | Only OpenAI, does not affect already-trained models |
| Spawning.ai's Have I Been Trained | Future LAION, Stability | Depends on compliance |
Common mistakes in opt-outs
- Blocking Googlebot instead of AI crawlers (different user agents)
- Assuming opt-out applies retroactively (it rarely does)
- Forgetting subdomains and CDN versions
- Not updating when new AI crawlers launch
- Assuming all crawlers obey robots.txt (many do not)
A working robots.txt
A robots.txt blocking major AI crawlers
# Block OpenAI, Anthropic, Google, ByteDance
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: Bytespider
Disallow: /The retroactive problem
If your content was scraped and trained on before you opted out, the model already knows it. Most labs say they cannot cleanly remove individual training data from a trained model. At best, they promise not to use it in future runs. This is the most common complaint from artists, writers, and content creators.
Key terms in this lesson
The big idea: opt-out mechanisms exist but are patchy, retroactive only by promise, and depend on good-faith crawlers. Real consent requires a different default.
End-of-lesson quiz
Check what stuck
15 questions · Score saves to your progress.
Tutor
Curious about “Opt-Out Mechanisms: The Real State of Consent”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Creators · 30 min
Debate Prep: Researching Both Sides Fast
Debate rewards knowing the other side's best argument better than they do. AI is built for exactly this kind of fast, balanced research.
Creators · 35 min
Running a Literature Review With AI
AI turns weeks of literature review into days — if you know how to use it. Here is a workflow that actually works.
Creators · 30 min
Citing AI-Assisted Work Honestly
The norms for disclosing AI use in research are still being written. Here is the emerging consensus and how to stay on the right side of it.
