Lesson 84 of 2244
Opt-Out Mechanisms: The Real State of Consent
Many AI companies now offer opt-outs from training. But how well do they actually work, and what are the catches?
Adults & Professionals · AI Foundations · ~17 min read
Opt-Out Is Not Opt-In
By default, most scraped web content is fair game for AI training unless you actively opt out. This is the opposite of GDPR's consent model. The ethics are debated, but the technical reality means individuals must proactively block usage rather than grant it.
The main opt-out channels
Compare the options
| Channel | Blocks what | Effectiveness |
|---|---|---|
| robots.txt + GPTBot | OpenAI's crawler | Works for OpenAI |
| robots.txt + ClaudeBot | Anthropic's crawler | Works for Anthropic |
| ai.txt (proposed) | AI training specifically | Voluntary, patchy adoption |
| DoNotTrain meta tag | Sites that honor it | Limited |
| OpenAI individual opt-out form | Future training | Only OpenAI, does not affect already-trained models |
| Spawning.ai's Have I Been Trained | Future LAION, Stability | Depends on compliance |
Common mistakes in opt-outs
- Blocking Googlebot instead of AI crawlers (different user agents)
- Assuming opt-out applies retroactively (it rarely does)
- Forgetting subdomains and CDN versions
- Not updating when new AI crawlers launch
- Assuming all crawlers obey robots.txt (many do not)
A working robots.txt
A robots.txt blocking major AI crawlers
# Block OpenAI, Anthropic, Google, ByteDance
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: Bytespider
Disallow: /The retroactive problem
If your content was scraped and trained on before you opted out, the model already knows it. Most labs say they cannot cleanly remove individual training data from a trained model. At best, they promise not to use it in future runs. This is the most common complaint from artists, writers, and content creators.
Key terms in this lesson
The big idea: opt-out mechanisms exist but are patchy, retroactive only by promise, and depend on good-faith crawlers. Real consent requires a different default.
End-of-lesson quiz
Check what stuck
10 questions · Score saves to your progress.
Tutor
Curious about “Opt-Out Mechanisms: The Real State of Consent”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Adults & Professionals · 35 min
Audit Methodology: How to Check a Dataset
A data audit is a structured process to find bias, errors, and ethical issues before a model goes live. Every creator should know how.
Adults & Professionals · 32 min
GDPR Basics: The Regulation That Changed Data
Europe's General Data Protection Regulation (2018) reshaped how the world handles personal data. Understanding its core concepts is now essential. In 2023, Italy briefly banned ChatGPT over GDPR concerns.
Builders · 28 min
NotebookLM: Turning Your Notes Into a Study Buddy
Google's NotebookLM lets you upload textbooks, lectures, and notes, then chat with them. This is the most underrated study tool of 2026.
