Lesson 265 of 1596
Pandas Fundamentals in 40 Minutes
Pandas is the Python library that made data science what it is today. Ten verbs get you through 90 percent of day-to-day data work.
Creators · AI Foundations · ~27 min read
Pandas Is the Table API
Pandas was created in 2008 by Wes McKinney at a hedge fund. Today it is the default Python library for tabular data, downloaded over 100 million times per month. Its two main types are Series (a single column) and DataFrame (a table).
Ten verbs you will use constantly
The ten most important pandas operations
import pandas as pd # 1. Load df = pd.read_csv('data.csv') # 2. Peek df.head() df.info() df.describe() # 3. Select columns df['age'] # one column (Series) df[['age', 'income']] # multiple columns (DataFrame) # 4. Filter rows df[df['age'] > 18] df[(df['age'] > 18) & (df['country'] == 'US')] # 5. Sort df.sort_values('income', ascending=False) # 6. Create columns df['income_per_age'] = df['income'] / df['age'] # 7. Group and aggregate df.groupby('country')['income'].mean() df.groupby(['country', 'gender']).agg({ 'income': ['mean', 'median'], 'age': 'mean' }) # 8. Join tables merged = pd.merge(df, other_df, on='user_id', how='left') # 9. Pivot pd.pivot_table(df, index='country', columns='year', values='income') # 10. Save df.to_csv('clean.csv', index=False) df.to_parquet('clean.parquet')Indexing: the most confusing part
Correct indexing patterns
# .loc uses labels df.loc[5] # row with index label 5 df.loc[df['age'] > 18, 'name'] # name column, filtered rows # .iloc uses positions df.iloc[5] # 6th row regardless of index label df.iloc[:10, :3] # first 10 rows, first 3 cols # Chained assignment is a trap # df[df.age > 18]['score'] = 100 # DO NOT DO THIS df.loc[df.age > 18, 'score'] = 100 # CORRECTCommon patterns worth memorizing
Patterns you will use every week
# Top N per group top3 = df.groupby('country').apply( lambda g: g.nlargest(3, 'income') ).reset_index(drop=True) # Rolling stats df['7d_avg'] = df['sales'].rolling(window=7).mean() # Replace based on mapping df['country'] = df['country'].replace({'USA': 'US', 'U.S.A.': 'US'}) # One-hot encoding df_encoded = pd.get_dummies(df, columns=['color']) # Handle dates df['date'] = pd.to_datetime(df['date']) df['day_of_week'] = df['date'].dt.day_name()The big idea: pandas rewards the ten verbs you use 90 percent of the time. Master those before chasing fancier features, and the other 10 percent will come naturally when you need it.
End-of-lesson quiz
Check what stuck
6 questions · Score saves to your progress.
Tutor
Curious about “Pandas Fundamentals in 40 Minutes”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Creators · 45 min
Open vs. Closed Models: Philosophy and Strategy
Open-source AI is both a technical movement and a political one. Understand the arguments so you can pick a stack and defend it.
Creators · 32 min
Synthetic Data: When AI Trains on AI
Real data is expensive, private, or scarce. Synthetic data is generated by models themselves. It is rapidly becoming as important as scraped data.
Creators · 30 min
Mean, Median, Mode: Three Kinds of Average
Saying the average is 50,000 dollars can mean three different things. Picking the wrong kind of average is how statistics starts lying to you.
