Lesson 305 of 2116
robots.txt and ai.txt: The Web's Consent Signals
A 30-year-old simple text file, robots.txt, is how the web has tried to regulate crawlers. The new ai.txt proposal aims to refine this for the AI era.
Lesson map
What this lesson covers
Learning path
The main moves in order
- 1A File From 1994
- 2robots.txt
- 3ai.txt
- 4crawler consent
Concept cluster
Terms to connect while reading
Section 1
A File From 1994
Martijn Koster created the Robots Exclusion Protocol in 1994. The idea is elegantly simple: every website hosts a plaintext file called robots.txt at its root. Well-behaved crawlers read it and follow its rules. It has no teeth, but it has held the web together for three decades.
The basic syntax
A minimal robots.txt
User-agent: *
Allow: /
Disallow: /private
Disallow: /admin
User-agent: Googlebot
Allow: /
Sitemap: https://example.com/sitemap.xmlHow AI crawlers handle it
- GPTBot (OpenAI, launched August 2023) — respects robots.txt
- ClaudeBot (Anthropic) — respects robots.txt
- Google-Extended (Google's Gemini training) — respects robots.txt separately from search
- CCBot (Common Crawl) — respects robots.txt
- PerplexityBot — respects robots.txt, but some reports suggest inconsistency
Enter ai.txt
Because robots.txt conflates search indexing with AI training, several proposals have emerged for a separate ai.txt file. Spawning.ai published one standard; others are under discussion at the IETF. The core idea is that publishers should be able to say yes to search and no to AI training without blocking all crawlers.
A proposed ai.txt format
# Example ai.txt
User-Agent: *
Disallow: /
# Allow specific AI uses
Allow-AI: research-noncommercial
Allow-AI: translation
Disallow-AI: training
Disallow-AI: generationAlternative signals
- HTML meta tags: <meta name="robots" content="noai">
- HTTP headers: X-Robots-Tag: noai, noimageai
- IPTC photo metadata: DataMining: prohibited
- C2PA content credentials: cryptographically signed provenance
Key terms in this lesson
The big idea: the web's consent infrastructure was built for a different era. Updating it for AI is an open project, and every site maintainer is a small participant in the standard we end up with.
End-of-lesson quiz
Check what stuck
15 questions · Score saves to your progress.
Tutor
Curious about “robots.txt and ai.txt: The Web's Consent Signals”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Creators · 30 min
Debate Prep: Researching Both Sides Fast
Debate rewards knowing the other side's best argument better than they do. AI is built for exactly this kind of fast, balanced research.
Creators · 35 min
Running a Literature Review With AI
AI turns weeks of literature review into days — if you know how to use it. Here is a workflow that actually works.
Creators · 30 min
Citing AI-Assisted Work Honestly
The norms for disclosing AI use in research are still being written. Here is the emerging consensus and how to stay on the right side of it.
