Lesson 311 of 2116
Reading and Writing CSV and JSON in Python
These two formats are the bread and butter of data interchange. Handling them well means handling edge cases well.
Lesson map
What this lesson covers
Learning path
The main moves in order
- 1The Two Formats You Will See Forever
- 2CSV
- 3JSON
- 4file I/O
Concept cluster
Terms to connect while reading
Section 1
The Two Formats You Will See Forever
CSV is the universal tabular format. JSON is the universal structured format. Almost every data pipeline reads or writes one or both. Knowing the pitfalls prevents hours of debugging.
CSV: the common gotchas
CSV read patterns that avoid pain
import pandas as pd
# Wrong encoding: common error for European or Latin American data
try:
df = pd.read_csv('data.csv')
except UnicodeDecodeError:
df = pd.read_csv('data.csv', encoding='latin-1')
# Custom delimiter (common: semicolon in German/French Excel)
df = pd.read_csv('data.csv', sep=';')
# Keep leading zeros on ID-like columns
df = pd.read_csv('data.csv', dtype={'zip_code': str, 'phone': str})
# Parse dates at load time
df = pd.read_csv('data.csv', parse_dates=['signup_date'])
# Treat sentinel values as NaN
df = pd.read_csv('data.csv', na_values=['', 'N/A', '-999', 'unknown'])
# Large files: stream in chunks
for chunk in pd.read_csv('huge.csv', chunksize=10_000):
process(chunk)Writing CSV cleanly
CSV write patterns
# index=False is almost always what you want
df.to_csv('output.csv', index=False)
# Use UTF-8 (default but be explicit)
df.to_csv('output.csv', encoding='utf-8', index=False)
# Keep floats readable
df.to_csv('output.csv', float_format='%.3f', index=False)
# Compress for big files
df.to_csv('output.csv.gz', compression='gzip', index=False)JSON: structured data
Read and flatten JSON
import json
import pandas as pd
# Simple read
with open('data.json') as f:
data = json.load(f)
# Line-delimited JSON (common for big data)
df = pd.read_json('data.jsonl', lines=True)
# Deeply nested JSON into a flat table
records = [
{'user': {'id': 1, 'name': 'Alice'}, 'events': [{'type': 'click'}]}
]
flat = pd.json_normalize(
records,
record_path='events',
meta=[['user', 'id'], ['user', 'name']]
)
print(flat)Writing JSON
Writing JSON with numpy support
import json
import numpy as np
data = {'scores': [0.1, 0.2, 0.3], 'name': 'test'}
# Pretty-printed, human-readable
with open('out.json', 'w') as f:
json.dump(data, f, indent=2, ensure_ascii=False)
# numpy types crash default json; convert them
class NPEncoder(json.JSONEncoder):
def default(self, o):
if isinstance(o, np.integer): return int(o)
if isinstance(o, np.floating): return float(o)
if isinstance(o, np.ndarray): return o.tolist()
return super().default(o)
with open('out.json', 'w') as f:
json.dump(data, f, cls=NPEncoder, indent=2)Key terms in this lesson
The big idea: file formats have personalities. Matching format to use case saves time, and handling encoding/quoting edge cases is what separates fragile scripts from reliable pipelines.
End-of-lesson quiz
Check what stuck
15 questions · Score saves to your progress.
Tutor
Curious about “Reading and Writing CSV and JSON in Python”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Creators · 32 min
Synthetic Data: When AI Trains on AI
Real data is expensive, private, or scarce. Synthetic data is generated by models themselves. It is rapidly becoming as important as scraped data.
Creators · 45 min
Pandas Fundamentals in 40 Minutes
Pandas is the Python library that made data science what it is today. Ten verbs get you through 90 percent of day-to-day data work.
Builders · 22 min
The Turing Test and Its Discontents
The imitation game became famous, but most AI researchers now think it measures the wrong thing.
