Lesson 266 of 1596
Reading and Writing CSV and JSON in Python
These two formats are the bread and butter of data interchange. Handling them well means handling edge cases well.
Creators · AI Foundations · ~18 min read
The Two Formats You Will See Forever
CSV is the universal tabular format. JSON is the universal structured format. Almost every data pipeline reads or writes one or both. Knowing the pitfalls prevents hours of debugging.
CSV: the common gotchas
CSV read patterns that avoid pain
import pandas as pd # Wrong encoding: common error for European or Latin American data try: df = pd.read_csv('data.csv') except UnicodeDecodeError: df = pd.read_csv('data.csv', encoding='latin-1') # Custom delimiter (common: semicolon in German/French Excel) df = pd.read_csv('data.csv', sep=';') # Keep leading zeros on ID-like columns df = pd.read_csv('data.csv', dtype={'zip_code': str, 'phone': str}) # Parse dates at load time df = pd.read_csv('data.csv', parse_dates=['signup_date']) # Treat sentinel values as NaN df = pd.read_csv('data.csv', na_values=['', 'N/A', '-999', 'unknown']) # Large files: stream in chunks for chunk in pd.read_csv('huge.csv', chunksize=10_000): process(chunk)Writing CSV cleanly
CSV write patterns
# index=False is almost always what you want df.to_csv('output.csv', index=False) # Use UTF-8 (default but be explicit) df.to_csv('output.csv', encoding='utf-8', index=False) # Keep floats readable df.to_csv('output.csv', float_format='%.3f', index=False) # Compress for big files df.to_csv('output.csv.gz', compression='gzip', index=False)JSON: structured data
Read and flatten JSON
import json import pandas as pd # Simple read with open('data.json') as f: data = json.load(f) # Line-delimited JSON (common for big data) df = pd.read_json('data.jsonl', lines=True) # Deeply nested JSON into a flat table records = [ {'user': {'id': 1, 'name': 'Alice'}, 'events': [{'type': 'click'}]} ] flat = pd.json_normalize( records, record_path='events', meta=[['user', 'id'], ['user', 'name']] ) print(flat)Writing JSON
Writing JSON with numpy support
import json import numpy as np data = {'scores': [0.1, 0.2, 0.3], 'name': 'test'} # Pretty-printed, human-readable with open('out.json', 'w') as f: json.dump(data, f, indent=2, ensure_ascii=False) # numpy types crash default json; convert them class NPEncoder(json.JSONEncoder): def default(self, o): if isinstance(o, np.integer): return int(o) if isinstance(o, np.floating): return float(o) if isinstance(o, np.ndarray): return o.tolist() return super().default(o) with open('out.json', 'w') as f: json.dump(data, f, cls=NPEncoder, indent=2)Key terms in this lesson
The big idea: file formats have personalities. Matching format to use case saves time, and handling encoding/quoting edge cases is what separates fragile scripts from reliable pipelines.
End-of-lesson quiz
Check what stuck
6 questions · Score saves to your progress.
Tutor
Curious about “Reading and Writing CSV and JSON in Python”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Creators · 32 min
Synthetic Data: When AI Trains on AI
Real data is expensive, private, or scarce. Synthetic data is generated by models themselves. It is rapidly becoming as important as scraped data.
Creators · 45 min
Pandas Fundamentals in 40 Minutes
Pandas is the Python library that made data science what it is today. Ten verbs get you through 90 percent of day-to-day data work.
Builders · 22 min
The Turing Test and Its Discontents
The imitation game became famous, but most AI researchers now think it measures the wrong thing.
