Lesson 209 of 1455
The Five Types of Data You Will Meet
Every column in a dataset has a type: number, text, date, boolean, or identifier. Mixing them up causes most beginner bugs.
Builders · AI Foundations · ~15 min read
Types Matter More Than You Think
In a CSV file, everything looks like text. But when you load it into Python or a database, each column gets a type. Get the types wrong and your model will behave bizarrely, or refuse to train at all.
The five you must know
Compare the options
| Type | Example | Common pitfall |
|---|---|---|
| Numeric | 17.5, 1000 | Zip codes look numeric but are not |
| Text (string) | hello, Claude | Dates often masquerade as strings |
| Date/time | 2026-04-23 | Time zones cause silent bugs |
| Boolean | true, false | 1 and 0 are often used instead |
| Identifier (ID) | user_a7b9c | Do not compute statistics on IDs |
Numeric: continuous vs. discrete
Continuous numbers can take any value (height: 170.5 cm). Discrete numbers are whole counts (students in class: 27). Machine learning models treat these differently. A continuous model might predict 27.3 students, which is meaningless.
Categorical data, disguised as numbers
Colors like red, blue, green are categories. Sometimes datasets encode them as 1, 2, 3 for space. But computing an average category (1.8) is nonsense. Models need to know these are categorical, not numeric.
Setting types correctly in pandas
import pandas as pd df = pd.read_csv('students.csv') # Force zip code to stay a string df = pd.read_csv('students.csv', dtype={'zip_code': str}) # Parse a date column properly df['birthday'] = pd.to_datetime(df['birthday']) # Mark a column as categorical df['grade'] = df['grade'].astype('category') print(df.dtypes)Key terms in this lesson
The big idea: types are the skeleton of a dataset. Get them right at load time, and everything downstream gets easier. Get them wrong, and you will spend hours chasing phantom bugs.
End-of-lesson quiz
Check what stuck
6 questions · Score saves to your progress.
Lesson help
Questions are best handled with a grown-up here.
For this age range, Tendril keeps freeform AI chat paused until parent/guardian consent and child-safe moderation are fully verified. Use the quiz, notes, and related lessons below, or ask a parent, guardian, teacher, or librarian to work through the question with you.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Builders · 22 min
What a Spreadsheet Actually Is
Excel and Google Sheets hide a lot of complexity behind a pretty grid. Once you see what is really happening, you will never look at a spreadsheet the same way.
Builders · 30 min
Is the Model Reasoning or Pattern Matching?
The line between deep reasoning and clever pattern recognition is blurry. Here's how researchers try to tell them apart.
Builders · 28 min
BLEU, ROUGE, F1 — Automatic Metrics and Their Limits
Before LLMs-as-judges, researchers had hand-made metrics. They still matter — and still mislead.
