Variance and Standard Deviation: How Spread Out?

Mean tells you the center. Variance and standard deviation tell you the spread. Without both, you are missing half the story.

30 min · Reviewed 2026

Two Classes With the Same Average

Both classes have an average test score of 80. Class A: everyone got between 78 and 82. Class B: half got 60 and half got 100. Same mean, very different stories. Variance and standard deviation capture this.

The formula

import numpy as np

class_a = [78, 79, 80, 80, 81, 82]
class_b = [60, 60, 60, 100, 100, 100]

print('Class A mean:', np.mean(class_a))       # 80
print('Class A std:', np.std(class_a))          # ~1.3
print('Class B mean:', np.mean(class_b))       # 80
print('Class B std:', np.std(class_b))          # ~20

# Same mean, very different spreads
# Class B's standard deviation is ~15x Class A'sTwo classes, same mean, different spreads

The 68-95-99.7 rule

For a normal (bell-shaped) distribution, about 68 percent of values fall within one standard deviation of the mean, 95 percent within two, and 99.7 percent within three. This is the empirical rule, and it lets you eyeball whether a single value is normal or unusual.

Why ML cares about variance

Feature scaling: algorithms like SVMs and neural nets expect features with similar variance
Model evaluation: reporting mean accuracy across runs without standard deviation is misleading
Z-scores: (x - mean) / std is a common normalization step
Detecting drift: if a feature's variance shifts over time, data is changing

Standard deviation in the wild

IQ scores: mean 100, std 15. A score of 130 is two standard deviations above average.
US adult male height: mean 175 cm, std 7 cm. A 190 cm man is about two std above average.
Daily stock returns: often a std of 1 to 2 percent for broad indices

The big idea: a dataset is a cloud, not a point. Mean locates the center; standard deviation describes the cloud's fuzziness. Always report both.

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-data-variance-std-dev

A dataset has a mean of 50 and a standard deviation of 5. What is the z-score of the value 40?
1. 2
2. -10
3. 10
4. -2
Why can't the 68-95-99.7 rule be applied to income data in a country?
1. Income data always has a standard deviation of zero
2. Income data has no standard deviation
3. The rule only applies to populations under 100
4. Income data typically follows a skewed distribution, not a normal one
Two datasets have the same mean but very different standard deviations. What does this tell you about the datasets?
1. They contain different numbers of data points
2. They have different amounts of spread or variability around the mean
3. They measure different phenomena entirely
4. One dataset must contain an error
A feature in a machine learning dataset has a variance of 0.0001, while another feature has a variance of 500. Why might this cause problems for certain algorithms?
1. The feature with lower variance is more important
2. Variance doesn't affect machine learning algorithms
3. The algorithm will automatically correct for this
4. Algorithms like SVMs and neural networks expect features with similar variance for fair comparison
What does it mean if a dataset has a standard deviation of zero?
1. The mean cannot be calculated
2. The dataset is too small to measure
3. All values in the dataset are identical
4. The dataset contains no data points
In the context of machine learning model evaluation, why is reporting only the mean accuracy insufficient?
1. Models never vary in their outputs
2. Mean accuracy is always 100% correct
3. Mean alone doesn't reveal how much accuracy varies across different runs or folds
4. Standard deviation is not useful for models
A student claims that 'variance is always bigger than standard deviation.' Is this true?
1. Always true, because variance involves squaring
2. Always false, because standard deviation is the square root of variance
3. Sometimes true, sometimes false—it depends on the data
4. Neither, because they measure completely different things
What is the primary reason we take the square root of variance to get standard deviation?
1. To make the number larger
2. To make statistical tests work
3. To return to the original units of the data
4. Because variance is always negative
In monitoring a machine learning system over time, what does an increase in a feature's variance typically signal?
1. The system requires no intervention
2. The data distribution may be drifting or changing
3. The model is improving
4. The feature has become less important
A class has test scores with a mean of 75. Another class has the same mean but a much larger standard deviation. What does this describe about the second class?
1. All students failed the test
2. The second class had more students
3. Scores were more spread out—some very high and some very low
4. The test was easier for the second class
If you wanted to know whether a specific value in a dataset is unusually far from the mean, which measure would be most useful?
1. Standard deviation, to calculate a z-score
2. Variance, because it's always positive
3. The median, because it's in the middle
4. The range, because it uses extreme values
Based on the 68-95-99.7 rule, approximately what percentage of values fall between two standard deviations below and two standard deviations above the mean in a normal distribution?
1. 95%
2. 50%
3. 68%
4. 99.7%
A dataset of daily stock returns has a standard deviation of 1.5%. What does this characteristic describe?
1. The typical daily fluctuation or volatility of returns around the average
2. The average return of the stock
3. The number of days the stock increased
4. The maximum possible daily loss
Two datasets have the same variance but different means. What does this indicate about their distributions?
1. One dataset is clearly better than the other
2. The datasets must have been measured incorrectly
3. The spread around each mean is similar, but the centers are at different positions
4. The datasets contain the exact same values
In machine learning, when is reporting standard deviation alongside mean performance metrics particularly important?
1. Never—only mean matters
2. When the features are scaled
3. When the dataset is very small
4. When comparing models to see if differences in mean performance are statistically meaningful

← Back to interactive lesson

Tendril · Creators · AI Foundations

Variance and Standard Deviation: How Spread Out?

Mean tells you the center. Variance and standard deviation tell you the spread. Without both, you are missing half the story.

30 min · Reviewed 2026

Two Classes With the Same Average

The formula

import numpy as np

class_a = [78, 79, 80, 80, 81, 82]
class_b = [60, 60, 60, 100, 100, 100]

print('Class A mean:', np.mean(class_a))       # 80
print('Class A std:', np.std(class_a))          # ~1.3
print('Class B mean:', np.mean(class_b))       # 80
print('Class B std:', np.std(class_b))          # ~20

# Same mean, very different spreads
# Class B's standard deviation is ~15x Class A'sTwo classes, same mean, different spreads

The 68-95-99.7 rule

Why ML cares about variance

Feature scaling: algorithms like SVMs and neural nets expect features with similar variance
Model evaluation: reporting mean accuracy across runs without standard deviation is misleading
Z-scores: (x - mean) / std is a common normalization step
Detecting drift: if a feature's variance shifts over time, data is changing

Standard deviation in the wild

IQ scores: mean 100, std 15. A score of 130 is two standard deviations above average.
US adult male height: mean 175 cm, std 7 cm. A 190 cm man is about two std above average.
Daily stock returns: often a std of 1 to 2 percent for broad indices

The big idea: a dataset is a cloud, not a point. Mean locates the center; standard deviation describes the cloud's fuzziness. Always report both.

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-data-variance-std-dev

A dataset has a mean of 50 and a standard deviation of 5. What is the z-score of the value 40?
1. 2
2. -10
3. 10
4. -2
Why can't the 68-95-99.7 rule be applied to income data in a country?
1. Income data always has a standard deviation of zero
2. Income data has no standard deviation
3. The rule only applies to populations under 100
4. Income data typically follows a skewed distribution, not a normal one
Two datasets have the same mean but very different standard deviations. What does this tell you about the datasets?
1. They contain different numbers of data points
2. They have different amounts of spread or variability around the mean
3. They measure different phenomena entirely
4. One dataset must contain an error
A feature in a machine learning dataset has a variance of 0.0001, while another feature has a variance of 500. Why might this cause problems for certain algorithms?
1. The feature with lower variance is more important
2. Variance doesn't affect machine learning algorithms
3. The algorithm will automatically correct for this
4. Algorithms like SVMs and neural networks expect features with similar variance for fair comparison
What does it mean if a dataset has a standard deviation of zero?
1. The mean cannot be calculated
2. The dataset is too small to measure
3. All values in the dataset are identical
4. The dataset contains no data points
In the context of machine learning model evaluation, why is reporting only the mean accuracy insufficient?
1. Models never vary in their outputs
2. Mean accuracy is always 100% correct
3. Mean alone doesn't reveal how much accuracy varies across different runs or folds
4. Standard deviation is not useful for models
A student claims that 'variance is always bigger than standard deviation.' Is this true?
1. Always true, because variance involves squaring
2. Always false, because standard deviation is the square root of variance
3. Sometimes true, sometimes false—it depends on the data
4. Neither, because they measure completely different things
What is the primary reason we take the square root of variance to get standard deviation?
1. To make the number larger
2. To make statistical tests work
3. To return to the original units of the data
4. Because variance is always negative
In monitoring a machine learning system over time, what does an increase in a feature's variance typically signal?
1. The system requires no intervention
2. The data distribution may be drifting or changing
3. The model is improving
4. The feature has become less important
A class has test scores with a mean of 75. Another class has the same mean but a much larger standard deviation. What does this describe about the second class?
1. All students failed the test
2. The second class had more students
3. Scores were more spread out—some very high and some very low
4. The test was easier for the second class
If you wanted to know whether a specific value in a dataset is unusually far from the mean, which measure would be most useful?
1. Standard deviation, to calculate a z-score
2. Variance, because it's always positive
3. The median, because it's in the middle
4. The range, because it uses extreme values
Based on the 68-95-99.7 rule, approximately what percentage of values fall between two standard deviations below and two standard deviations above the mean in a normal distribution?
1. 95%
2. 50%
3. 68%
4. 99.7%
A dataset of daily stock returns has a standard deviation of 1.5%. What does this characteristic describe?
1. The typical daily fluctuation or volatility of returns around the average
2. The average return of the stock
3. The number of days the stock increased
4. The maximum possible daily loss
Two datasets have the same variance but different means. What does this indicate about their distributions?
1. One dataset is clearly better than the other
2. The datasets must have been measured incorrectly
3. The spread around each mean is similar, but the centers are at different positions
4. The datasets contain the exact same values
In machine learning, when is reporting standard deviation alongside mean performance metrics particularly important?
1. Never—only mean matters
2. When the features are scaled
3. When the dataset is very small
4. When comparing models to see if differences in mean performance are statistically meaningful

← Back to interactive lesson