Lesson 260 of 1596
robots.txt and ai.txt: The Web's Consent Signals
A 30-year-old simple text file, robots.txt, is how the web has tried to regulate crawlers. The new ai.txt proposal aims to refine this for the AI era.
Creators · AI Foundations · ~15 min read
A File From 1994
Martijn Koster created the Robots Exclusion Protocol in 1994. The idea is elegantly simple: every website hosts a plaintext file called robots.txt at its root. Well-behaved crawlers read it and follow its rules. It has no teeth, but it has held the web together for three decades.
The basic syntax
A minimal robots.txt
User-agent: * Allow: / Disallow: /private Disallow: /admin User-agent: Googlebot Allow: / Sitemap: https://example.com/sitemap.xmlHow AI crawlers handle it
- GPTBot (OpenAI, launched August 2023) — respects robots.txt
- ClaudeBot (Anthropic) — respects robots.txt
- Google-Extended (Google's Gemini training) — respects robots.txt separately from search
- CCBot (Common Crawl) — respects robots.txt
- PerplexityBot — respects robots.txt, but some reports suggest inconsistency
Enter ai.txt
Because robots.txt conflates search indexing with AI training, several proposals have emerged for a separate ai.txt file. Spawning.ai published one standard; others are under discussion at the IETF. The core idea is that publishers should be able to say yes to search and no to AI training without blocking all crawlers.
A proposed ai.txt format
# Example ai.txt User-Agent: * Disallow: / # Allow specific AI uses Allow-AI: research-noncommercial Allow-AI: translation Disallow-AI: training Disallow-AI: generationAlternative signals
- HTML meta tags: <meta name="robots" content="noai">
- HTTP headers: X-Robots-Tag: noai, noimageai
- IPTC photo metadata: DataMining: prohibited
- C2PA content credentials: cryptographically signed provenance
Key terms in this lesson
The big idea: the web's consent infrastructure was built for a different era. Updating it for AI is an open project, and every site maintainer is a small participant in the standard we end up with.
End-of-lesson quiz
Check what stuck
8 questions · Score saves to your progress.
Tutor
Curious about “robots.txt and ai.txt: The Web's Consent Signals”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Creators · 30 min
Debate Prep: Researching Both Sides Fast
Debate rewards knowing the other side's best argument better than they do. AI is built for exactly this kind of fast, balanced research.
Creators · 35 min
Running a Literature Review With AI
AI turns weeks of literature review into days — if you know how to use it. Here is a workflow that actually works.
Creators · 30 min
Citing AI-Assisted Work Honestly
The norms for disclosing AI use in research are still being written. Here is the emerging consensus and how to stay on the right side of it.
