Lesson 203 of 1570
Correlation vs. Causation
The most famous warning in statistics is also the most ignored. Here is how to actually tell them apart.
Lesson map
What this lesson covers
Learning path
The main moves in order
- 1Two Things Move Together
- 2correlation
- 3causation
- 4confounding
Concept cluster
Terms to connect while reading
Section 1
Two Things Move Together
Correlation means two things go up and down together. Causation means one of them actually causes the other. It is easy to find correlations. It is much harder to establish causation.
Why they get confused
- Ice cream sales and drowning both rise in summer — not because ice cream drowns you
- Users who open the chatbot more convert more — but causation could run either way
- Countries with more storks have higher birth rates — confounded by rural/urban differences
What actually establishes causation
- 1Randomized controlled trial: you intervene (the gold standard)
- 2Natural experiments: accidental random assignment in the wild
- 3Instrumental variables: a variable affecting A but not B directly
- 4Causal inference frameworks (Pearl's do-calculus, potential outcomes)
Compare the options
| Correlational claim | Causal claim |
|---|---|
| Users who use feature X convert more | Launching feature X will increase conversions |
| Models with more parameters score higher | Adding parameters would raise this model's score |
| People who read self-help books are happier | Reading self-help makes people happier |
“Correlation does not imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing 'look over there'.”
Key terms in this lesson
The big idea: correlations are the fuel of science, but not its conclusion. Ask what experiment would distinguish the stories before you believe one.
End-of-lesson quiz
Check what stuck
15 questions · Score saves to your progress.
Tutor
Curious about “Correlation vs. Causation”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Builders · 30 min
Where Training Data Actually Comes From
You cannot understand modern AI without understanding its diet. Let's map where the data comes from, how it gets cleaned, and what that means.
Builders · 28 min
Statistical Significance and P-Values
P-value is one of the most abused numbers in research. Here is what it actually says — and what it does not. 'Model B is no better than model A.' 'The new prompt does not change user satisfaction.' A low p-value means the boring story would rarely produce data that looks like what you saw.
Builders · 25 min
Missing Data and How to Spot It
Real datasets have holes. Blank cells, NaN, NULL, -999, and the dreaded empty string. Learning to see them is a core skill.
