Lesson 1404 of 1455
AI and Training Data: Where It Came From and Why It Matters
AI was trained on most of the public internet — including stuff people did not want used. Learn the ethics teens care about.
Builders · AI Foundations · ~4 min read
The big idea
Every model you use was trained on text and images scraped from the web. Some artists and writers consented; most did not. The lawsuits in 2025 are still being decided, and your generation will live with whatever rules win.
Some examples
- Ask Claude what Common Crawl is and how much of the web it covers.
- Ask ChatGPT which 2025 lawsuits actually won against AI companies.
- Ask Gemini what 'opt out' means for an artist in 2026 and whether it actually works.
- Ask Perplexity for examples of AI outputs that are nearly identical to training data.
Try it!
Ask Claude 'what artists are in your training data?' Notice the answer. Decide what that means for how you use AI art.
Key terms in this lesson
Practice this safely
Try this with a school, hobby, or family example where the stakes are low. Use the AI output as a draft you can question, not as the final answer.
- 1Ask AI to explain training data in plain language, then underline anything that sounds uncertain or too broad.
- 2Give it one detail from "AI and Training Data: Where It Came From and Why It Matters" and ask for two possible next steps plus one reason each step might be wrong.
- 3Check copyright against a trusted source, teacher, adult, expert, or original document before you use it.
End-of-lesson quiz
Check what stuck
8 questions · Score saves to your progress.
Lesson help
Questions are best handled with a grown-up here.
For this age range, Tendril keeps freeform AI chat paused until parent/guardian consent and child-safe moderation are fully verified. Use the quiz, notes, and related lessons below, or ask a parent, guardian, teacher, or librarian to work through the question with you.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Builders · 7 min
AI and the training data question: where did all this knowledge come from?
Understand what AI was trained on and why that shapes everything it says.
Creators · 45 min
The Economics and Ethics of Training Data
Data is the strategic asset of AI. Understand the supply chain, the legal fight, and the philosophical stakes before you build anything on top.
Builders · 30 min
Where Training Data Actually Comes From
You cannot understand modern AI without understanding its diet. Let's map where the data comes from, how it gets cleaned, and what that means.
