Lesson 1519 of 1570
AI and Training Data: Where It Came From and Why It Matters
AI was trained on most of the public internet — including stuff people did not want used. Learn the ethics teens care about.
Lesson map
What this lesson covers
Learning path
The main moves in order
- 1The big idea
- 2training data
- 3copyright
- 4Common Crawl
Concept cluster
Terms to connect while reading
Section 1
The big idea
Every model you use was trained on text and images scraped from the web. Some artists and writers consented; most did not. The lawsuits in 2025 are still being decided, and your generation will live with whatever rules win.
Some examples
- Ask Claude what Common Crawl is and how much of the web it covers.
- Ask ChatGPT which 2025 lawsuits actually won against AI companies.
- Ask Gemini what 'opt out' means for an artist in 2026 and whether it actually works.
- Ask Perplexity for examples of AI outputs that are nearly identical to training data.
Try it!
Ask Claude 'what artists are in your training data?' Notice the answer. Decide what that means for how you use AI art.
Key terms in this lesson
End-of-lesson quiz
Check what stuck
15 questions · Score saves to your progress.
Tutor
Curious about “AI and Training Data: Where It Came From and Why It Matters”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Builders · 7 min
AI and the training data question: where did all this knowledge come from?
Understand what AI was trained on and why that shapes everything it says.
Creators · 45 min
The Economics and Ethics of Training Data
Data is the strategic asset of AI. Understand the supply chain, the legal fight, and the philosophical stakes before you build anything on top.
Builders · 30 min
Where Training Data Actually Comes From
You cannot understand modern AI without understanding its diet. Let's map where the data comes from, how it gets cleaned, and what that means.
