Tendril

Lesson 304 of 2116

Opt-Out Mechanisms: The Real State of Consent

Many AI companies now offer opt-outs from training. But how well do they actually work, and what are the catches?

CreatorsAI Foundations~17 min readAdvancedResearcherBI4 · Natural InteractionBI5 · Societal ImpactPrint / PDF

Lesson map

What this lesson covers

28 min15 blocks3 concepts

Learning path

The main moves in order

1Opt-Out Is Not Opt-In
2opt-out
3consent
4training exclusion

Concept cluster

Terms to connect while reading

opt-outconsenttraining exclusion

Sections5

Lists1

Notes3

Code1

Compare1

Section 1

Opt-Out Is Not Opt-In

By default, most scraped web content is fair game for AI training unless you actively opt out. This is the opposite of GDPR's consent model. The ethics are debated, but the technical reality means individuals must proactively block usage rather than grant it.

The main opt-out channels

Compare the options

Channel	Blocks what	Effectiveness
robots.txt + GPTBot	OpenAI's crawler	Works for OpenAI
robots.txt + ClaudeBot	Anthropic's crawler	Works for Anthropic
ai.txt (proposed)	AI training specifically	Voluntary, patchy adoption
DoNotTrain meta tag	Sites that honor it	Limited
OpenAI individual opt-out form	Future training	Only OpenAI, does not affect already-trained models
Spawning.ai's Have I Been Trained	Future LAION, Stability	Depends on compliance

Common mistakes in opt-outs

Blocking Googlebot instead of AI crawlers (different user agents)
Assuming opt-out applies retroactively (it rarely does)
Forgetting subdomains and CDN versions
Not updating when new AI crawlers launch
Assuming all crawlers obey robots.txt (many do not)

Check-in 1. Got it so far?

A working robots.txt

A robots.txt blocking major AI crawlers

text

# Block OpenAI, Anthropic, Google, ByteDance
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Bytespider
Disallow: /

The retroactive problem

If your content was scraped and trained on before you opted out, the model already knows it. Most labs say they cannot cleanly remove individual training data from a trained model. At best, they promise not to use it in future runs. This is the most common complaint from artists, writers, and content creators.

Check-in 2. Got it so far?

Key terms in this lesson

Check-in 3. Got it so far?

The big idea: opt-out mechanisms exist but are patchy, retroactive only by promise, and depend on good-faith crawlers. Real consent requires a different default.

End-of-lesson quiz

Check what stuck

15 questions · Score saves to your progress.

Tutor

Curious about “Opt-Out Mechanisms: The Real State of Consent”?

Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.

Progress saved locally in this browser. Sign in to sync across devices.

Related lessons

Opt-Out Mechanisms: The Real State of Consent

Opt-Out Is Not Opt-In

The main opt-out channels

Common mistakes in opt-outs

A working robots.txt

The retroactive problem

Curious about “Opt-Out Mechanisms: The Real State of Consent”?

Keep going

Opt-Out Mechanisms: The Real State of Consent

Opt-Out Is Not Opt-In

The main opt-out channels

Common mistakes in opt-outs

A working robots.txt

The retroactive problem

Curious about “Opt-Out Mechanisms: The Real State of Consent”?

Keep going