Lesson 310 of 2116
Pandas Fundamentals in 40 Minutes
Pandas is the Python library that made data science what it is today. Ten verbs get you through 90 percent of day-to-day data work.
Lesson map
What this lesson covers
Learning path
The main moves in order
- 1Pandas Is the Table API
- 2pandas
- 3DataFrame
- 4Series
Concept cluster
Terms to connect while reading
Section 1
Pandas Is the Table API
Pandas was created in 2008 by Wes McKinney at a hedge fund. Today it is the default Python library for tabular data, downloaded over 100 million times per month. Its two main types are Series (a single column) and DataFrame (a table).
Ten verbs you will use constantly
The ten most important pandas operations
import pandas as pd
# 1. Load
df = pd.read_csv('data.csv')
# 2. Peek
df.head()
df.info()
df.describe()
# 3. Select columns
df['age'] # one column (Series)
df[['age', 'income']] # multiple columns (DataFrame)
# 4. Filter rows
df[df['age'] > 18]
df[(df['age'] > 18) & (df['country'] == 'US')]
# 5. Sort
df.sort_values('income', ascending=False)
# 6. Create columns
df['income_per_age'] = df['income'] / df['age']
# 7. Group and aggregate
df.groupby('country')['income'].mean()
df.groupby(['country', 'gender']).agg({
'income': ['mean', 'median'],
'age': 'mean'
})
# 8. Join tables
merged = pd.merge(df, other_df, on='user_id', how='left')
# 9. Pivot
pd.pivot_table(df, index='country', columns='year', values='income')
# 10. Save
df.to_csv('clean.csv', index=False)
df.to_parquet('clean.parquet')Indexing: the most confusing part
Correct indexing patterns
# .loc uses labels
df.loc[5] # row with index label 5
df.loc[df['age'] > 18, 'name'] # name column, filtered rows
# .iloc uses positions
df.iloc[5] # 6th row regardless of index label
df.iloc[:10, :3] # first 10 rows, first 3 cols
# Chained assignment is a trap
# df[df.age > 18]['score'] = 100 # DO NOT DO THIS
df.loc[df.age > 18, 'score'] = 100 # CORRECTCommon patterns worth memorizing
Patterns you will use every week
# Top N per group
top3 = df.groupby('country').apply(
lambda g: g.nlargest(3, 'income')
).reset_index(drop=True)
# Rolling stats
df['7d_avg'] = df['sales'].rolling(window=7).mean()
# Replace based on mapping
df['country'] = df['country'].replace({'USA': 'US', 'U.S.A.': 'US'})
# One-hot encoding
df_encoded = pd.get_dummies(df, columns=['color'])
# Handle dates
df['date'] = pd.to_datetime(df['date'])
df['day_of_week'] = df['date'].dt.day_name()The big idea: pandas rewards the ten verbs you use 90 percent of the time. Master those before chasing fancier features, and the other 10 percent will come naturally when you need it.
End-of-lesson quiz
Check what stuck
15 questions · Score saves to your progress.
Tutor
Curious about “Pandas Fundamentals in 40 Minutes”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Creators · 45 min
Open vs. Closed Models: Philosophy and Strategy
Open-source AI is both a technical movement and a political one. Understand the arguments so you can pick a stack and defend it.
Creators · 32 min
Synthetic Data: When AI Trains on AI
Real data is expensive, private, or scarce. Synthetic data is generated by models themselves. It is rapidly becoming as important as scraped data.
Creators · 30 min
Mean, Median, Mode: Three Kinds of Average
Saying the average is 50,000 dollars can mean three different things. Picking the wrong kind of average is how statistics starts lying to you.
