Loading lesson…
P-value is one of the most abused numbers in research. Here is what it actually says — and what it does not. 'Model B is no better than model A.' 'The new prompt does not change user satisfaction.' A low p-value means the boring story would rarely produce data that looks like what you saw.
You see p < 0.05 in papers and headlines constantly. What does it actually mean? Precisely: if the null hypothesis were true, the probability of seeing a result this extreme or more extreme is less than 5 percent.
The null hypothesis is the boring story. 'Model B is no better than model A.' 'The new prompt does not change user satisfaction.' A low p-value means the boring story would rarely produce data that looks like what you saw.
| Phrase heard | What it actually means |
|---|---|
| 'Statistically significant' | P-value below threshold, under one analysis |
| 'Not statistically significant' | Might mean no effect, or might mean not enough data |
| 'Highly significant (p<0.001)' | Less likely by chance under null — but still not proof |
| 'Effect size' | The number that actually matters |
The difference between 'significant' and 'not significant' is not itself statistically significant.
— Gelman and Stern (2006)
The big idea: p-values are one weak piece of evidence, often presented as if they were a verdict. Effect size and replication matter more.
8 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-builders-statistical-significance
What is the main idea of "Statistical Significance and P-Values"?
Which concept is most central to "Statistical Significance and P-Values"?
Which use of AI fits this topic best?
What should a careful learner remember about "What p-value is not"?
You want to use AI after this lesson. What is the safest next step?
How should AI output about p-value be treated?
Name one way to verify an AI answer about p-value.
Which action would help you apply "Statistical Significance and P-Values" responsibly?