Lesson 299 of 2116
Bootstrapping: Confidence Without a Formula
Bootstrapping estimates the uncertainty of any statistic, even when you have no clean mathematical formula. It is simple, powerful, and surprisingly deep.
Lesson map
What this lesson covers
Learning path
The main moves in order
- 1The Magic Trick
- 2bootstrap
- 3confidence interval
- 4simulation
Concept cluster
Terms to connect while reading
Section 1
The Magic Trick
You have 200 people's salaries. You want a 95 percent confidence interval for the median. The math for median confidence intervals is ugly. Bradley Efron's 1979 insight: just pretend your sample is the population. Sample with replacement 10,000 times, compute the median each time, and use the 2.5th and 97.5th percentiles as your interval. That is the bootstrap.
A minimal bootstrap
Bootstrap a confidence interval
import numpy as np
data = np.array([45, 52, 60, 61, 67, 72, 78, 85, 92, 120])
def bootstrap_median(data, n_boot=10000):
n = len(data)
medians = np.zeros(n_boot)
for i in range(n_boot):
sample = np.random.choice(data, size=n, replace=True)
medians[i] = np.median(sample)
return medians
boots = bootstrap_median(data)
ci_low, ci_high = np.percentile(boots, [2.5, 97.5])
print(f'Median: {np.median(data):.1f}')
print(f'95% CI: [{ci_low:.1f}, {ci_high:.1f}]')What you can bootstrap
- Any statistic: mean, median, standard deviation, IQR, correlation, regression coefficient
- Model performance metrics: accuracy, AUC, RMSE
- The difference between two groups' statistics
- Predictions from a model (by bootstrapping the training data)
When bootstrap fails
- Extreme statistics (maximum, minimum): bootstrap systematically under-estimates
- Very skewed distributions: may need BCa correction
- Time-series: plain bootstrap destroys temporal structure (use block bootstrap)
Permutation tests: a cousin of bootstrap
A permutation test answers: how likely is the observed difference between groups if the labels were random? Shuffle the labels many times, compute the statistic each time, and see where your real statistic falls. It needs no parametric assumptions.
Key terms in this lesson
The big idea: when the math gets hard, simulation gets easy. Bootstrap lets you compute uncertainty for almost any statistic with a few lines of code. It is one of the quiet superpowers of modern data analysis.
End-of-lesson quiz
Check what stuck
15 questions · Score saves to your progress.
Tutor
Curious about “Bootstrapping: Confidence Without a Formula”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Creators · 45 min
Open vs. Closed Models: Philosophy and Strategy
Open-source AI is both a technical movement and a political one. Understand the arguments so you can pick a stack and defend it.
Creators · 32 min
AP Biology: Using AI to Survive the Vocab Tsunami
AP Bio has roughly a thousand terms and four big concepts. NotebookLM and Claude Projects can turn your textbook into a custom tutor that actually knows what you are studying.
Creators · 40 min
Golden-Dataset Curation
A golden dataset is a curated set of hard, representative examples you trust completely. It is the backbone of every serious eval.
