Loading lesson…
Real datasets have holes. Blank cells, NaN, NULL, -999, and the dreaded empty string. Learning to see them is a core skill.
In a perfect world, every row would have every column filled in. In reality, datasets are full of gaps. A survey respondent skipped a question. A sensor cut out for three seconds. A database migration dropped a field. All of this creates missing data.
import pandas as pd import numpy as np df = pd.read_csv('survey.csv', na_values=['-999', 'N/A', 'unknown']) # How much is missing in each column? print(df.isna().sum()) print(df.isna().mean()) # as a fraction # Fill age with the median df['age'] = df['age'].fillna(df['age'].median()) # Flag missingness before filling df['income_was_missing'] = df['income'].isna() df['income'] = df['income'].fillna(df['income'].median())Detecting and handling missing data in pandasThe big idea: missing data is not just absence, it is often information. Treat every gap as a question, not an error.
8 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-data-missing-data
What is the main idea of "Missing Data and How to Spot It"?
Which concept is most central to "Missing Data and How to Spot It"?
Which use of AI fits this topic best?
What should a careful learner remember about "The many faces of missing"?
You want to use AI after this lesson. What is the safest next step?
How should AI output about missing data be treated?
Name one way to verify an AI answer about missing data.
Which action would help you apply "Missing Data and How to Spot It" responsibly?