Missing Data and How to Spot It

Real datasets have holes. Blank cells, NaN, NULL, -999, and the dreaded empty string. Learning to see them is a core skill.

25 min · Reviewed 2026

Data Has Holes

In a perfect world, every row would have every column filled in. In reality, datasets are full of gaps. A survey respondent skipped a question. A sensor cut out for three seconds. A database migration dropped a field. All of this creates missing data.

Three flavors of missingness

MCAR — Missing Completely At Random: a sensor glitched. The gap has nothing to do with the value.
MAR — Missing At Random: men are less likely to answer a health survey question. The missingness depends on another column (gender) but not on the answer itself.
MNAR — Missing Not At Random: people with very high incomes refuse to report their income. The value itself causes the missingness. This is the dangerous one.

Common ways to handle missing data

Drop rows with missing values (simple but throws away data)
Fill with the mean or median of the column (imputation)
Fill with a predicted value from other columns
Flag missingness as its own feature (is_missing = true)
Leave it for the model to handle (some models tolerate NaN)

import pandas as pd import numpy as np df = pd.read_csv('survey.csv', na_values=['-999', 'N/A', 'unknown']) # How much is missing in each column? print(df.isna().sum()) print(df.isna().mean()) # as a fraction # Fill age with the median df['age'] = df['age'].fillna(df['age'].median()) # Flag missingness before filling df['income_was_missing'] = df['income'].isna() df['income'] = df['income'].fillna(df['income'].median())Detecting and handling missing data in pandas

The big idea: missing data is not just absence, it is often information. Treat every gap as a question, not an error.

End-of-lesson check

8 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-data-missing-data

What is the main idea of "Missing Data and How to Spot It"?
1. Real datasets have holes. Blank cells, NaN, NULL, -999, and the dreaded empty string. Learning to see them is a core skill.
2. Use AI as the final authority for the whole decision
3. Avoid checking the answer once it sounds polished
4. Focus only on speed instead of judgment
Which concept is most central to "Missing Data and How to Spot It"?
1. NaN
2. missing data
3. imputation
4. data quality
Which use of AI fits this topic best?
1. Let the AI decide what matters without your review
2. Use the answer before checking whether it fits the situation
3. MCAR — Missing Completely At Random: a sensor glitched. The gap has nothing to do with the value.
4. Use the first answer without checking it
What should a careful learner remember about "The many faces of missing"?
1. Use AI to draft or organize ideas about missing data, then verify before acting.
2. Skip the context so the tool can guess faster
3. Treat the output as private even after sharing it online
4. Use the answer without checking the source
You want to use AI after this lesson. What is the safest next step?
1. Act immediately because the AI answer is written clearly
2. Use the AI answer as a draft, then check it against a reliable source.
3. Hide uncertainty so the final answer looks cleaner
4. Use private or sensitive details before checking permission
How should AI output about missing data be treated?
1. As proof that no other source is needed
2. As a replacement for context, consent, or expert review
3. As a draft or helper output that still needs human judgment and verification
4. As something that becomes correct when it sounds confident
Name one way to verify an AI answer about missing data.
Which action would help you apply "Missing Data and How to Spot It" responsibly?
1. Use the tool to avoid thinking through the tradeoff
2. Keep going even if the output conflicts with a trusted source
3. Use the first answer without checking it
4. MAR — Missing At Random: men are less likely to answer a health survey question.

← Back to interactive lesson

Tendril · Builders · AI Foundations

Missing Data and How to Spot It

Real datasets have holes. Blank cells, NaN, NULL, -999, and the dreaded empty string. Learning to see them is a core skill.

25 min · Reviewed 2026

Data Has Holes

Three flavors of missingness

MCAR — Missing Completely At Random: a sensor glitched. The gap has nothing to do with the value.
MAR — Missing At Random: men are less likely to answer a health survey question. The missingness depends on another column (gender) but not on the answer itself.
MNAR — Missing Not At Random: people with very high incomes refuse to report their income. The value itself causes the missingness. This is the dangerous one.

Common ways to handle missing data

Drop rows with missing values (simple but throws away data)
Fill with the mean or median of the column (imputation)
Fill with a predicted value from other columns
Flag missingness as its own feature (is_missing = true)
Leave it for the model to handle (some models tolerate NaN)

import pandas as pd import numpy as np df = pd.read_csv('survey.csv', na_values=['-999', 'N/A', 'unknown']) # How much is missing in each column? print(df.isna().sum()) print(df.isna().mean()) # as a fraction # Fill age with the median df['age'] = df['age'].fillna(df['age'].median()) # Flag missingness before filling df['income_was_missing'] = df['income'].isna() df['income'] = df['income'].fillna(df['income'].median())Detecting and handling missing data in pandas

The big idea: missing data is not just absence, it is often information. Treat every gap as a question, not an error.

End-of-lesson check

8 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-data-missing-data

What is the main idea of "Missing Data and How to Spot It"?
1. Real datasets have holes. Blank cells, NaN, NULL, -999, and the dreaded empty string. Learning to see them is a core skill.
2. Use AI as the final authority for the whole decision
3. Avoid checking the answer once it sounds polished
4. Focus only on speed instead of judgment
Which concept is most central to "Missing Data and How to Spot It"?
1. NaN
2. missing data
3. imputation
4. data quality
Which use of AI fits this topic best?
1. Let the AI decide what matters without your review
2. Use the answer before checking whether it fits the situation
3. MCAR — Missing Completely At Random: a sensor glitched. The gap has nothing to do with the value.
4. Use the first answer without checking it
What should a careful learner remember about "The many faces of missing"?
1. Use AI to draft or organize ideas about missing data, then verify before acting.
2. Skip the context so the tool can guess faster
3. Treat the output as private even after sharing it online
4. Use the answer without checking the source
You want to use AI after this lesson. What is the safest next step?
1. Act immediately because the AI answer is written clearly
2. Use the AI answer as a draft, then check it against a reliable source.
3. Hide uncertainty so the final answer looks cleaner
4. Use private or sensitive details before checking permission
How should AI output about missing data be treated?
1. As proof that no other source is needed
2. As a replacement for context, consent, or expert review
3. As a draft or helper output that still needs human judgment and verification
4. As something that becomes correct when it sounds confident
Name one way to verify an AI answer about missing data.
Which action would help you apply "Missing Data and How to Spot It" responsibly?
1. Use the tool to avoid thinking through the tradeoff
2. Keep going even if the output conflicts with a trusted source
3. Use the first answer without checking it
4. MAR — Missing At Random: men are less likely to answer a health survey question.

← Back to interactive lesson