Loading lesson…
Data comes in shapes. The shape determines which tools you can use, and which assumptions will silently betray you.
Plot any column as a histogram and you will see its distribution. Some distributions are bell-shaped, some are long-tailed, some have two humps. The shape is not cosmetic; it determines what statistical tools work.
| Distribution | Shape | Real example |
|---|---|---|
| Normal (Gaussian) | Symmetric bell | Human heights, measurement errors |
| Power-law | Very long right tail | City populations, YouTube views, wealth |
| Bimodal | Two humps | Commute times (car vs. transit), restaurant sizes |
Normal distributions show up whenever many small, independent causes add together (Central Limit Theorem). Height is the sum of many genetic and environmental factors, each tiny. Polling errors are the sum of many small deviations. Mean and standard deviation fully describe a normal distribution.
Power-laws appear when outcomes multiply rather than add. Rich people can invest and get richer. Popular videos get recommended and get more popular. A tiny fraction of items (the head) dominates everything else (the long tail). In a power-law, mean is essentially meaningless because a handful of extreme values dominate.
A bimodal distribution has two peaks, usually because your data contains two different populations mashed together. Commute times bimodal: car commuters and transit commuters. Restaurant sizes bimodal: independent cafes and chain restaurants. The trick: if you see bimodality, split the data into two populations and analyze each separately.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.read_csv('some_data.csv')
# Plot histogram to see the shape
for col in df.select_dtypes('number').columns:
df[col].hist(bins=50)
plt.title(col)
plt.show()
# Check for skew
from scipy.stats import skew, kurtosis
print('Skew:', {c: skew(df[c]) for c in df.select_dtypes('number')})
# |skew| > 1 means substantial asymmetryAlways plot your distributions firstThe big idea: plot first, average later. The shape of your data tells you which tools to trust and which will give you confidently wrong answers.
15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-data-distributions
What is the core idea behind "Distributions: Normal, Power-Law, and Bimodal"?
Which term best describes a foundational idea in "Distributions: Normal, Power-Law, and Bimodal"?
A learner studying Distributions: Normal, Power-Law, and Bimodal would need to understand which concept?
Which of these is directly relevant to Distributions: Normal, Power-Law, and Bimodal?
What is the key insight about "Power-law trap" in the context of Distributions: Normal, Power-Law, and Bimodal?
What is the recommended tip about "Ground your practice in fundamentals" in the context of Distributions: Normal, Power-Law, and Bimodal?
Which statement accurately describes an aspect of Distributions: Normal, Power-Law, and Bimodal?
What does working with Distributions: Normal, Power-Law, and Bimodal typically involve?
Which of the following is true about Distributions: Normal, Power-Law, and Bimodal?
Which best describes the scope of "Distributions: Normal, Power-Law, and Bimodal"?
Which section heading best belongs in a lesson about Distributions: Normal, Power-Law, and Bimodal?
Which section heading best belongs in a lesson about Distributions: Normal, Power-Law, and Bimodal?
Which section heading best belongs in a lesson about Distributions: Normal, Power-Law, and Bimodal?
Which section heading best belongs in a lesson about Distributions: Normal, Power-Law, and Bimodal?
Which of the following is a concept covered in Distributions: Normal, Power-Law, and Bimodal?