Loading lesson…
Every column in a dataset has a type: number, text, date, boolean, or identifier. Mixing them up causes most beginner bugs.
In a CSV file, everything looks like text. But when you load it into Python or a database, each column gets a type. Get the types wrong and your model will behave bizarrely, or refuse to train at all.
| Type | Example | Common pitfall |
|---|---|---|
| Numeric | 17.5, 1000 | Zip codes look numeric but are not |
| Text (string) | hello, Claude | Dates often masquerade as strings |
| Date/time | 2026-04-23 | Time zones cause silent bugs |
| Boolean | true, false | 1 and 0 are often used instead |
| Identifier (ID) | user_a7b9c | Do not compute statistics on IDs |
Continuous numbers can take any value (height: 170.5 cm). Discrete numbers are whole counts (students in class: 27). Machine learning models treat these differently. A continuous model might predict 27.3 students, which is meaningless.
Colors like red, blue, green are categories. Sometimes datasets encode them as 1, 2, 3 for space. But computing an average category (1.8) is nonsense. Models need to know these are categorical, not numeric.
import pandas as pd df = pd.read_csv('students.csv') # Force zip code to stay a string df = pd.read_csv('students.csv', dtype={'zip_code': str}) # Parse a date column properly df['birthday'] = pd.to_datetime(df['birthday']) # Mark a column as categorical df['grade'] = df['grade'].astype('category') print(df.dtypes)Setting types correctly in pandasThe big idea: types are the skeleton of a dataset. Get them right at load time, and everything downstream gets easier. Get them wrong, and you will spend hours chasing phantom bugs.
6 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-data-types-of-data
What is the main idea of "The Five Types of Data You Will Meet"?
Which concept is most central to "The Five Types of Data You Will Meet"?
What should a careful learner remember about "The zip code trap"?
You want to use AI after this lesson. What is the safest next step?
How should AI output about data types be treated?
Name one way to verify an AI answer about data types.