Lesson 209 of 1570
The Five Types of Data You Will Meet
Every column in a dataset has a type: number, text, date, boolean, or identifier. Mixing them up causes most beginner bugs.
Lesson map
What this lesson covers
Learning path
The main moves in order
- 1Types Matter More Than You Think
- 2data types
- 3numeric
- 4categorical
Concept cluster
Terms to connect while reading
Section 1
Types Matter More Than You Think
In a CSV file, everything looks like text. But when you load it into Python or a database, each column gets a type. Get the types wrong and your model will behave bizarrely, or refuse to train at all.
The five you must know
Compare the options
| Type | Example | Common pitfall |
|---|---|---|
| Numeric | 17.5, 1000 | Zip codes look numeric but are not |
| Text (string) | hello, Claude | Dates often masquerade as strings |
| Date/time | 2026-04-23 | Time zones cause silent bugs |
| Boolean | true, false | 1 and 0 are often used instead |
| Identifier (ID) | user_a7b9c | Do not compute statistics on IDs |
Numeric: continuous vs. discrete
Continuous numbers can take any value (height: 170.5 cm). Discrete numbers are whole counts (students in class: 27). Machine learning models treat these differently. A continuous model might predict 27.3 students, which is meaningless.
Categorical data, disguised as numbers
Colors like red, blue, green are categories. Sometimes datasets encode them as 1, 2, 3 for space. But computing an average category (1.8) is nonsense. Models need to know these are categorical, not numeric.
Setting types correctly in pandas
import pandas as pd
df = pd.read_csv('students.csv')
# Force zip code to stay a string
df = pd.read_csv('students.csv', dtype={'zip_code': str})
# Parse a date column properly
df['birthday'] = pd.to_datetime(df['birthday'])
# Mark a column as categorical
df['grade'] = df['grade'].astype('category')
print(df.dtypes)Key terms in this lesson
The big idea: types are the skeleton of a dataset. Get them right at load time, and everything downstream gets easier. Get them wrong, and you will spend hours chasing phantom bugs.
End-of-lesson quiz
Check what stuck
15 questions · Score saves to your progress.
Tutor
Curious about “The Five Types of Data You Will Meet”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Builders · 22 min
What a Spreadsheet Actually Is
Excel and Google Sheets hide a lot of complexity behind a pretty grid. Once you see what is really happening, you will never look at a spreadsheet the same way.
Builders · 30 min
Is the Model Reasoning or Pattern Matching?
The line between deep reasoning and clever pattern recognition is blurry. Here's how researchers try to tell them apart.
Builders · 28 min
BLEU, ROUGE, F1 — Automatic Metrics and Their Limits
Before LLMs-as-judges, researchers had hand-made metrics. They still matter — and still mislead.
