Loading lesson…
These two formats are the bread and butter of data interchange. Handling them well means handling edge cases well.
CSV is the universal tabular format. JSON is the universal structured format. Almost every data pipeline reads or writes one or both. Knowing the pitfalls prevents hours of debugging.
import pandas as pd
# Wrong encoding: common error for European or Latin American data
try:
df = pd.read_csv('data.csv')
except UnicodeDecodeError:
df = pd.read_csv('data.csv', encoding='latin-1')
# Custom delimiter (common: semicolon in German/French Excel)
df = pd.read_csv('data.csv', sep=';')
# Keep leading zeros on ID-like columns
df = pd.read_csv('data.csv', dtype={'zip_code': str, 'phone': str})
# Parse dates at load time
df = pd.read_csv('data.csv', parse_dates=['signup_date'])
# Treat sentinel values as NaN
df = pd.read_csv('data.csv', na_values=['', 'N/A', '-999', 'unknown'])
# Large files: stream in chunks
for chunk in pd.read_csv('huge.csv', chunksize=10_000):
process(chunk)CSV read patterns that avoid pain# index=False is almost always what you want
df.to_csv('output.csv', index=False)
# Use UTF-8 (default but be explicit)
df.to_csv('output.csv', encoding='utf-8', index=False)
# Keep floats readable
df.to_csv('output.csv', float_format='%.3f', index=False)
# Compress for big files
df.to_csv('output.csv.gz', compression='gzip', index=False)CSV write patternsimport json
import pandas as pd
# Simple read
with open('data.json') as f:
data = json.load(f)
# Line-delimited JSON (common for big data)
df = pd.read_json('data.jsonl', lines=True)
# Deeply nested JSON into a flat table
records = [
{'user': {'id': 1, 'name': 'Alice'}, 'events': [{'type': 'click'}]}
]
flat = pd.json_normalize(
records,
record_path='events',
meta=[['user', 'id'], ['user', 'name']]
)
print(flat)Read and flatten JSONimport json
import numpy as np
data = {'scores': [0.1, 0.2, 0.3], 'name': 'test'}
# Pretty-printed, human-readable
with open('out.json', 'w') as f:
json.dump(data, f, indent=2, ensure_ascii=False)
# numpy types crash default json; convert them
class NPEncoder(json.JSONEncoder):
def default(self, o):
if isinstance(o, np.integer): return int(o)
if isinstance(o, np.floating): return float(o)
if isinstance(o, np.ndarray): return o.tolist()
return super().default(o)
with open('out.json', 'w') as f:
json.dump(data, f, cls=NPEncoder, indent=2)Writing JSON with numpy supportThe big idea: file formats have personalities. Matching format to use case saves time, and handling encoding/quoting edge cases is what separates fragile scripts from reliable pipelines.
15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-data-reading-writing-csv-json
What is the core idea behind "Reading and Writing CSV and JSON in Python"?
Which term best describes a foundational idea in "Reading and Writing CSV and JSON in Python"?
A learner studying Reading and Writing CSV and JSON in Python would need to understand which concept?
Which of these is directly relevant to Reading and Writing CSV and JSON in Python?
What is the key insight about "When to prefer JSON over CSV" in the context of Reading and Writing CSV and JSON in Python?
What is the recommended tip about "Ground your practice in fundamentals" in the context of Reading and Writing CSV and JSON in Python?
Which statement accurately describes an aspect of Reading and Writing CSV and JSON in Python?
What does working with Reading and Writing CSV and JSON in Python typically involve?
Which best describes the scope of "Reading and Writing CSV and JSON in Python"?
Which section heading best belongs in a lesson about Reading and Writing CSV and JSON in Python?
Which section heading best belongs in a lesson about Reading and Writing CSV and JSON in Python?
Which section heading best belongs in a lesson about Reading and Writing CSV and JSON in Python?
Which section heading best belongs in a lesson about Reading and Writing CSV and JSON in Python?
Which of the following is a concept covered in Reading and Writing CSV and JSON in Python?
Which of the following is a concept covered in Reading and Writing CSV and JSON in Python?