Reading and Writing CSV and JSON in Python

Section 1

The Two Formats You Will See Forever

CSV read patterns that avoid pain

python

import pandas as pd

# Wrong encoding: common error for European or Latin American data
try:
    df = pd.read_csv('data.csv')
except UnicodeDecodeError:
    df = pd.read_csv('data.csv', encoding='latin-1')

# Custom delimiter (common: semicolon in German/French Excel)
df = pd.read_csv('data.csv', sep=';')

# Keep leading zeros on ID-like columns
df = pd.read_csv('data.csv', dtype={'zip_code': str, 'phone': str})

# Parse dates at load time
df = pd.read_csv('data.csv', parse_dates=['signup_date'])

# Treat sentinel values as NaN
df = pd.read_csv('data.csv', na_values=['', 'N/A', '-999', 'unknown'])

# Large files: stream in chunks
for chunk in pd.read_csv('huge.csv', chunksize=10_000):
    process(chunk)

CSV write patterns

python

# index=False is almost always what you want
df.to_csv('output.csv', index=False)

# Use UTF-8 (default but be explicit)
df.to_csv('output.csv', encoding='utf-8', index=False)

# Keep floats readable
df.to_csv('output.csv', float_format='%.3f', index=False)

# Compress for big files
df.to_csv('output.csv.gz', compression='gzip', index=False)

Read and flatten JSON

python

import json
import pandas as pd

# Simple read
with open('data.json') as f:
    data = json.load(f)

# Line-delimited JSON (common for big data)
df = pd.read_json('data.jsonl', lines=True)

# Deeply nested JSON into a flat table
records = [
    {'user': {'id': 1, 'name': 'Alice'}, 'events': [{'type': 'click'}]}
]
flat = pd.json_normalize(
    records,
    record_path='events',
    meta=[['user', 'id'], ['user', 'name']]
)
print(flat)

Writing JSON with numpy support

python

import json
import numpy as np

data = {'scores': [0.1, 0.2, 0.3], 'name': 'test'}

# Pretty-printed, human-readable
with open('out.json', 'w') as f:
    json.dump(data, f, indent=2, ensure_ascii=False)

# numpy types crash default json; convert them
class NPEncoder(json.JSONEncoder):
    def default(self, o):
        if isinstance(o, np.integer): return int(o)
        if isinstance(o, np.floating): return float(o)
        if isinstance(o, np.ndarray): return o.tolist()
        return super().default(o)

with open('out.json', 'w') as f:
    json.dump(data, f, cls=NPEncoder, indent=2)

Key terms in this lesson

Reading and Writing CSV and JSON in Python

The Two Formats You Will See Forever

CSV: the common gotchas

Writing CSV cleanly

JSON: structured data

Writing JSON

Curious about “Reading and Writing CSV and JSON in Python”?

Keep going

Reading and Writing CSV and JSON in Python

The Two Formats You Will See Forever

CSV: the common gotchas

Writing CSV cleanly

JSON: structured data

Writing JSON

Curious about “Reading and Writing CSV and JSON in Python”?

Keep going