Loading lesson…
A 30-year-old simple text file, robots.txt, is how the web has tried to regulate crawlers. The new ai.txt proposal aims to refine this for the AI era.
Martijn Koster created the Robots Exclusion Protocol in 1994. The idea is elegantly simple: every website hosts a plaintext file called robots.txt at its root. Well-behaved crawlers read it and follow its rules. It has no teeth, but it has held the web together for three decades.
User-agent: * Allow: / Disallow: /private Disallow: /admin User-agent: Googlebot Allow: / Sitemap: https://example.com/sitemap.xmlA minimal robots.txtBecause robots.txt conflates search indexing with AI training, several proposals have emerged for a separate ai.txt file. Spawning.ai published one standard; others are under discussion at the IETF. The core idea is that publishers should be able to say yes to search and no to AI training without blocking all crawlers.
# Example ai.txt User-Agent: * Disallow: / # Allow specific AI uses Allow-AI: research-noncommercial Allow-AI: translation Disallow-AI: training Disallow-AI: generationA proposed ai.txt formatThe big idea: the web's consent infrastructure was built for a different era. Updating it for AI is an open project, and every site maintainer is a small participant in the standard we end up with.
8 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-data-robots-txt-ai-txt
What is the main idea of "robots.txt and ai.txt: The Web's Consent Signals"?
Which concept is most central to "robots.txt and ai.txt: The Web's Consent Signals"?
Which use of AI fits this topic best?
What should a careful learner remember about "The enforcement problem"?
You want to use AI after this lesson. What is the safest next step?
How should AI output about robots.txt be treated?
Name one way to verify an AI answer about robots.txt.
Which action would help you apply "robots.txt and ai.txt: The Web's Consent Signals" responsibly?