Loading lesson…
Resampling techniques draw new samples from your data to estimate uncertainty, balance classes, or validate models. It is one of the most underused superpowers in statistics.
You have 1,000 data points. A single train/test split gives you one estimate of model accuracy. But what if you happen to get a lucky or unlucky split? Resampling lets you run the experiment many times, getting more reliable answers from the same data.
| Technique | Purpose | Key idea |
|---|---|---|
| K-fold CV | Model evaluation | Split data into k parts, train on k-1, test on 1, rotate |
| Leave-one-out | Model evaluation, tiny datasets | Train on n-1, test on 1, repeat n times |
| Stratified sampling | Preserve class balance | Sample within each class separately |
| Bootstrap | Estimate uncertainty | Sample with replacement, many times |
| Permutation | Hypothesis testing | Shuffle labels, re-compute stat |
| SMOTE | Class imbalance | Generate synthetic minority examples |
from sklearn.model_selection import KFold, cross_val_score from sklearn.linear_model import LogisticRegression import numpy as np X, y = load_data() model = LogisticRegression() scores = cross_val_score(model, X, y, cv=5) print(f'Accuracy: {scores.mean():.3f} +/- {scores.std():.3f}') # Each fold is trained and tested, giving 5 accuracy numbers # Mean is the expected accuracy, std is the uncertainty5-fold cross-validationIf 99 percent of your data is class A and 1 percent is class B (fraud detection, rare disease), a naive model just predicts A every time and hits 99 percent accuracy while being useless. SMOTE (Synthetic Minority Oversampling Technique) generates realistic new minority examples by interpolating between existing ones.
from imblearn.over_sampling import SMOTE X_resampled, y_resampled = SMOTE().fit_resample(X_train, y_train) print('Before:', dict(zip(*np.unique(y_train, return_counts=True)))) print('After:', dict(zip(*np.unique(y_resampled, return_counts=True))))SMOTE for class balancingThe big idea: a single train/test split is rarely enough. Resampling turns one experiment into many, giving you honest uncertainty estimates and squeezing more learning from limited data.
6 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-data-resampling
What is the main idea of "Resampling: Making Data Work Harder"?
Which concept is most central to "Resampling: Making Data Work Harder"?
What should a careful learner remember about "SMOTE cautions"?
You want to use AI after this lesson. What is the safest next step?
How should AI output about resampling be treated?
Name one way to verify an AI answer about resampling.