Loading lesson…
These two formats are the bread and butter of data interchange. Handling them well means handling edge cases well.
CSV is the universal tabular format. JSON is the universal structured format. Almost every data pipeline reads or writes one or both. Knowing the pitfalls prevents hours of debugging.
import pandas as pd # Wrong encoding: common error for European or Latin American data try: df = pd.read_csv('data.csv') except UnicodeDecodeError: df = pd.read_csv('data.csv', encoding='latin-1') # Custom delimiter (common: semicolon in German/French Excel) df = pd.read_csv('data.csv', sep=';') # Keep leading zeros on ID-like columns df = pd.read_csv('data.csv', dtype={'zip_code': str, 'phone': str}) # Parse dates at load time df = pd.read_csv('data.csv', parse_dates=['signup_date']) # Treat sentinel values as NaN df = pd.read_csv('data.csv', na_values=['', 'N/A', '-999', 'unknown']) # Large files: stream in chunks for chunk in pd.read_csv('huge.csv', chunksize=10_000): process(chunk)CSV read patterns that avoid pain# index=False is almost always what you want df.to_csv('output.csv', index=False) # Use UTF-8 (default but be explicit) df.to_csv('output.csv', encoding='utf-8', index=False) # Keep floats readable df.to_csv('output.csv', float_format='%.3f', index=False) # Compress for big files df.to_csv('output.csv.gz', compression='gzip', index=False)CSV write patternsimport json import pandas as pd # Simple read with open('data.json') as f: data = json.load(f) # Line-delimited JSON (common for big data) df = pd.read_json('data.jsonl', lines=True) # Deeply nested JSON into a flat table records = [ {'user': {'id': 1, 'name': 'Alice'}, 'events': [{'type': 'click'}]} ] flat = pd.json_normalize( records, record_path='events', meta=[['user', 'id'], ['user', 'name']] ) print(flat)Read and flatten JSONimport json import numpy as np data = {'scores': [0.1, 0.2, 0.3], 'name': 'test'} # Pretty-printed, human-readable with open('out.json', 'w') as f: json.dump(data, f, indent=2, ensure_ascii=False) # numpy types crash default json; convert them class NPEncoder(json.JSONEncoder): def default(self, o): if isinstance(o, np.integer): return int(o) if isinstance(o, np.floating): return float(o) if isinstance(o, np.ndarray): return o.tolist() return super().default(o) with open('out.json', 'w') as f: json.dump(data, f, cls=NPEncoder, indent=2)Writing JSON with numpy supportThe big idea: file formats have personalities. Matching format to use case saves time, and handling encoding/quoting edge cases is what separates fragile scripts from reliable pipelines.
6 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-data-reading-writing-csv-json
What is the main idea of "Reading and Writing CSV and JSON in Python"?
Which concept is most central to "Reading and Writing CSV and JSON in Python"?
What should a careful learner remember about "When to prefer JSON over CSV"?
You want to use AI after this lesson. What is the safest next step?
How should AI output about CSV be treated?
Name one way to verify an AI answer about CSV.