Loading lesson…
Mean tells you the center. Variance and standard deviation tell you the spread. Without both, you are missing half the story.
Both classes have an average test score of 80. Class A: everyone got between 78 and 82. Class B: half got 60 and half got 100. Same mean, very different stories. Variance and standard deviation capture this.
import numpy as np
class_a = [78, 79, 80, 80, 81, 82]
class_b = [60, 60, 60, 100, 100, 100]
print('Class A mean:', np.mean(class_a)) # 80
print('Class A std:', np.std(class_a)) # ~1.3
print('Class B mean:', np.mean(class_b)) # 80
print('Class B std:', np.std(class_b)) # ~20
# Same mean, very different spreads
# Class B's standard deviation is ~15x Class A'sTwo classes, same mean, different spreadsFor a normal (bell-shaped) distribution, about 68 percent of values fall within one standard deviation of the mean, 95 percent within two, and 99.7 percent within three. This is the empirical rule, and it lets you eyeball whether a single value is normal or unusual.
The big idea: a dataset is a cloud, not a point. Mean locates the center; standard deviation describes the cloud's fuzziness. Always report both.
15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-data-variance-std-dev
A dataset has a mean of 50 and a standard deviation of 5. What is the z-score of the value 40?
Why can't the 68-95-99.7 rule be applied to income data in a country?
Two datasets have the same mean but very different standard deviations. What does this tell you about the datasets?
A feature in a machine learning dataset has a variance of 0.0001, while another feature has a variance of 500. Why might this cause problems for certain algorithms?
What does it mean if a dataset has a standard deviation of zero?
In the context of machine learning model evaluation, why is reporting only the mean accuracy insufficient?
A student claims that 'variance is always bigger than standard deviation.' Is this true?
What is the primary reason we take the square root of variance to get standard deviation?
In monitoring a machine learning system over time, what does an increase in a feature's variance typically signal?
A class has test scores with a mean of 75. Another class has the same mean but a much larger standard deviation. What does this describe about the second class?
If you wanted to know whether a specific value in a dataset is unusually far from the mean, which measure would be most useful?
Based on the 68-95-99.7 rule, approximately what percentage of values fall between two standard deviations below and two standard deviations above the mean in a normal distribution?
A dataset of daily stock returns has a standard deviation of 1.5%. What does this characteristic describe?
Two datasets have the same variance but different means. What does this indicate about their distributions?
In machine learning, when is reporting standard deviation alongside mean performance metrics particularly important?