Lesson 205 of 1570
Sampling Bias
If your sample is skewed, your conclusion is skewed. Here is how to spot it.
Lesson map
What this lesson covers
Learning path
The main moves in order
- 1Who Did You Ask?
- 2sampling bias
- 3survivorship bias
- 4selection
Concept cluster
Terms to connect while reading
Section 1
Who Did You Ask?
Every data-driven claim rests on the sample it was drawn from. If the sample is not representative of what you claim to describe, the conclusion is corrupted before the math even starts.
Famous examples
- 1936 Literary Digest poll predicted Landon in a landslide; Roosevelt won — they polled car and phone owners
- WWII survivorship bias: Wald noticed planes that returned were shot where survivors could take hits; reinforce the UN-hit spots
- Online reviews over-represent extreme experiences (1-star angry or 5-star delighted)
Common AI versions
- 1Training data over-represents English-speaking, internet-active people
- 2Benchmark curators skew toward their own cultures and topics
- 3LMArena votes come disproportionately from tech-savvy users
- 4Released models are the survivors — failures never ship
Compare the options
| Biased source | What you actually learn |
|---|---|
| Only your customers | How loyal users feel, not how strangers would react |
| Only Reddit posts | What Reddit-posting people think |
| Only English Wikipedia | What English editors could agree on |
| Only passing tests | What the test curriculum rewards |
“The bullet holes in the plane are where the plane can take a hit and still fly home.”
Key terms in this lesson
The big idea: always ask 'who is in this sample?' before asking 'what does this sample say?'
End-of-lesson quiz
Check what stuck
15 questions · Score saves to your progress.
Tutor
Curious about “Sampling Bias”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Builders · 30 min
Where Training Data Actually Comes From
You cannot understand modern AI without understanding its diet. Let's map where the data comes from, how it gets cleaned, and what that means.
Builders · 25 min
Benchmarks, Leaderboards, and Their Limits
Every new model claims a new high score. Before you trust a leaderboard, learn what benchmarks actually measure — and what they miss.
Builders · 25 min
Emergence: When Abilities Appear Out of Nowhere
As models scale, some skills do not gradually improve — they just snap into existence. Let's look at what emergence really means and why it scares people.
