Loading lesson…
Every column in a dataset has a type: number, text, date, boolean, or identifier. Mixing them up causes most beginner bugs.
In a CSV file, everything looks like text. But when you load it into Python or a database, each column gets a type. Get the types wrong and your model will behave bizarrely, or refuse to train at all.
| Type | Example | Common pitfall |
|---|---|---|
| Numeric | 17.5, 1000 | Zip codes look numeric but are not |
| Text (string) | hello, Claude | Dates often masquerade as strings |
| Date/time | 2026-04-23 | Time zones cause silent bugs |
| Boolean | true, false | 1 and 0 are often used instead |
| Identifier (ID) | user_a7b9c | Do not compute statistics on IDs |
Continuous numbers can take any value (height: 170.5 cm). Discrete numbers are whole counts (students in class: 27). Machine learning models treat these differently. A continuous model might predict 27.3 students, which is meaningless.
Colors like red, blue, green are categories. Sometimes datasets encode them as 1, 2, 3 for space. But computing an average category (1.8) is nonsense. Models need to know these are categorical, not numeric.
import pandas as pd
df = pd.read_csv('students.csv')
# Force zip code to stay a string
df = pd.read_csv('students.csv', dtype={'zip_code': str})
# Parse a date column properly
df['birthday'] = pd.to_datetime(df['birthday'])
# Mark a column as categorical
df['grade'] = df['grade'].astype('category')
print(df.dtypes)Setting types correctly in pandasThe big idea: types are the skeleton of a dataset. Get them right at load time, and everything downstream gets easier. Get them wrong, and you will spend hours chasing phantom bugs.
15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-data-types-of-data
What is the core idea behind "The Five Types of Data You Will Meet"?
Which term best describes a foundational idea in "The Five Types of Data You Will Meet"?
A learner studying The Five Types of Data You Will Meet would need to understand which concept?
Which of these is directly relevant to The Five Types of Data You Will Meet?
What is the key insight about "The zip code trap" in the context of The Five Types of Data You Will Meet?
Which statement accurately describes an aspect of The Five Types of Data You Will Meet?
What does working with The Five Types of Data You Will Meet typically involve?
Which of the following is true about The Five Types of Data You Will Meet?
Which best describes the scope of "The Five Types of Data You Will Meet"?
Which section heading best belongs in a lesson about The Five Types of Data You Will Meet?
Which section heading best belongs in a lesson about The Five Types of Data You Will Meet?
Which section heading best belongs in a lesson about The Five Types of Data You Will Meet?
Which of the following is a concept covered in The Five Types of Data You Will Meet?
Which of the following is a concept covered in The Five Types of Data You Will Meet?
Which of the following is a concept covered in The Five Types of Data You Will Meet?