Distributions: Normal, Power-Law, and Bimodal

Section 1

Every Dataset Has a Shape

Compare the options

Distribution	Shape	Real example
Normal (Gaussian)	Symmetric bell	Human heights, measurement errors
Power-law	Very long right tail	City populations, YouTube views, wealth
Bimodal	Two humps	Commute times (car vs. transit), restaurant sizes

Always plot your distributions first

python

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

df = pd.read_csv('some_data.csv')

# Plot histogram to see the shape
for col in df.select_dtypes('number').columns:
    df[col].hist(bins=50)
    plt.title(col)
    plt.show()

# Check for skew
from scipy.stats import skew, kurtosis
print('Skew:', {c: skew(df[c]) for c in df.select_dtypes('number')})
# |skew| > 1 means substantial asymmetry

Key terms in this lesson

Distributions: Normal, Power-Law, and Bimodal

Every Dataset Has a Shape

The three shapes you must know

Normal distributions

Power-law distributions

Bimodal distributions

Curious about “Distributions: Normal, Power-Law, and Bimodal”?

Keep going

Distributions: Normal, Power-Law, and Bimodal

Every Dataset Has a Shape

The three shapes you must know

Normal distributions

Power-law distributions

Bimodal distributions

Curious about “Distributions: Normal, Power-Law, and Bimodal”?

Keep going