Loading lesson…
Data comes in shapes. The shape determines which tools you can use, and which assumptions will silently betray you.
Plot any column as a histogram and you will see its distribution. Some distributions are bell-shaped, some are long-tailed, some have two humps. The shape is not cosmetic; it determines what statistical tools work.
| Distribution | Shape | Real example |
|---|---|---|
| Normal (Gaussian) | Symmetric bell | Human heights, measurement errors |
| Power-law | Very long right tail | City populations, YouTube views, wealth |
| Bimodal | Two humps | Commute times (car vs. transit), restaurant sizes |
Normal distributions show up whenever many small, independent causes add together (Central Limit Theorem). Height is the sum of many genetic and environmental factors, each tiny. Polling errors are the sum of many small deviations. Mean and standard deviation fully describe a normal distribution.
Power-laws appear when outcomes multiply rather than add. Rich people can invest and get richer. Popular videos get recommended and get more popular. A tiny fraction of items (the head) dominates everything else (the long tail). In a power-law, mean is essentially meaningless because a handful of extreme values dominate.
A bimodal distribution has two peaks, usually because your data contains two different populations mashed together. Commute times bimodal: car commuters and transit commuters. Restaurant sizes bimodal: independent cafes and chain restaurants. The trick: if you see bimodality, split the data into two populations and analyze each separately.
import matplotlib.pyplot as plt import numpy as np import pandas as pd df = pd.read_csv('some_data.csv') # Plot histogram to see the shape for col in df.select_dtypes('number').columns: df[col].hist(bins=50) plt.title(col) plt.show() # Check for skew from scipy.stats import skew, kurtosis print('Skew:', {c: skew(df[c]) for c in df.select_dtypes('number')}) # |skew| > 1 means substantial asymmetryAlways plot your distributions firstThe big idea: plot first, average later. The shape of your data tells you which tools to trust and which will give you confidently wrong answers.
6 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-data-distributions
What is the main idea of "Distributions: Normal, Power-Law, and Bimodal"?
Which concept is most central to "Distributions: Normal, Power-Law, and Bimodal"?
What should a careful learner remember about "Power-law trap"?
You want to use AI after this lesson. What is the safest next step?
How should AI output about distributions be treated?
Name one way to verify an AI answer about distributions.