Reading and Writing CSV and JSON in Python

These two formats are the bread and butter of data interchange. Handling them well means handling edge cases well.

30 min · Reviewed 2026

The Two Formats You Will See Forever

CSV is the universal tabular format. JSON is the universal structured format. Almost every data pipeline reads or writes one or both. Knowing the pitfalls prevents hours of debugging.

CSV: the common gotchas

import pandas as pd # Wrong encoding: common error for European or Latin American data try: df = pd.read_csv('data.csv') except UnicodeDecodeError: df = pd.read_csv('data.csv', encoding='latin-1') # Custom delimiter (common: semicolon in German/French Excel) df = pd.read_csv('data.csv', sep=';') # Keep leading zeros on ID-like columns df = pd.read_csv('data.csv', dtype={'zip_code': str, 'phone': str}) # Parse dates at load time df = pd.read_csv('data.csv', parse_dates=['signup_date']) # Treat sentinel values as NaN df = pd.read_csv('data.csv', na_values=['', 'N/A', '-999', 'unknown']) # Large files: stream in chunks for chunk in pd.read_csv('huge.csv', chunksize=10_000): process(chunk)CSV read patterns that avoid pain

Writing CSV cleanly

# index=False is almost always what you want df.to_csv('output.csv', index=False) # Use UTF-8 (default but be explicit) df.to_csv('output.csv', encoding='utf-8', index=False) # Keep floats readable df.to_csv('output.csv', float_format='%.3f', index=False) # Compress for big files df.to_csv('output.csv.gz', compression='gzip', index=False)CSV write patterns

JSON: structured data

import json import pandas as pd # Simple read with open('data.json') as f: data = json.load(f) # Line-delimited JSON (common for big data) df = pd.read_json('data.jsonl', lines=True) # Deeply nested JSON into a flat table records = [ {'user': {'id': 1, 'name': 'Alice'}, 'events': [{'type': 'click'}]} ] flat = pd.json_normalize( records, record_path='events', meta=[['user', 'id'], ['user', 'name']] ) print(flat)Read and flatten JSON

Writing JSON

import json import numpy as np data = {'scores': [0.1, 0.2, 0.3], 'name': 'test'} # Pretty-printed, human-readable with open('out.json', 'w') as f: json.dump(data, f, indent=2, ensure_ascii=False) # numpy types crash default json; convert them class NPEncoder(json.JSONEncoder): def default(self, o): if isinstance(o, np.integer): return int(o) if isinstance(o, np.floating): return float(o) if isinstance(o, np.ndarray): return o.tolist() return super().default(o) with open('out.json', 'w') as f: json.dump(data, f, cls=NPEncoder, indent=2)Writing JSON with numpy support

The big idea: file formats have personalities. Matching format to use case saves time, and handling encoding/quoting edge cases is what separates fragile scripts from reliable pipelines.

End-of-lesson check

6 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-data-reading-writing-csv-json

What is the main idea of "Reading and Writing CSV and JSON in Python"?
1. These two formats are the bread and butter of data interchange. Handling them well means handling edge cases well.
2. Use AI as the final authority for the whole decision
3. Avoid checking the answer once it sounds polished
4. Focus only on speed instead of judgment
Which concept is most central to "Reading and Writing CSV and JSON in Python"?
1. JSON
2. CSV
3. file I/O
4. encoding
What should a careful learner remember about "When to prefer JSON over CSV"?
1. Use AI to draft or organize ideas about CSV, then verify before acting.
2. Skip the context so the tool can guess faster
3. Treat the output as private even after sharing it online
4. Use the answer without checking the source
You want to use AI after this lesson. What is the safest next step?
1. Act immediately because the AI answer is written clearly
2. Use AI for drafting and comparison, but verify before publishing or relying on it.
3. Hide uncertainty so the final answer looks cleaner
4. Use private or sensitive details before checking permission
How should AI output about CSV be treated?
1. As proof that no other source is needed
2. As a replacement for context, consent, or expert review
3. As a draft or helper output that still needs human judgment and verification
4. As something that becomes correct when it sounds confident
Name one way to verify an AI answer about CSV.

← Back to interactive lesson

Tendril · Creators · AI Foundations

Reading and Writing CSV and JSON in Python

These two formats are the bread and butter of data interchange. Handling them well means handling edge cases well.

30 min · Reviewed 2026

The Two Formats You Will See Forever

CSV is the universal tabular format. JSON is the universal structured format. Almost every data pipeline reads or writes one or both. Knowing the pitfalls prevents hours of debugging.

CSV: the common gotchas

import pandas as pd # Wrong encoding: common error for European or Latin American data try: df = pd.read_csv('data.csv') except UnicodeDecodeError: df = pd.read_csv('data.csv', encoding='latin-1') # Custom delimiter (common: semicolon in German/French Excel) df = pd.read_csv('data.csv', sep=';') # Keep leading zeros on ID-like columns df = pd.read_csv('data.csv', dtype={'zip_code': str, 'phone': str}) # Parse dates at load time df = pd.read_csv('data.csv', parse_dates=['signup_date']) # Treat sentinel values as NaN df = pd.read_csv('data.csv', na_values=['', 'N/A', '-999', 'unknown']) # Large files: stream in chunks for chunk in pd.read_csv('huge.csv', chunksize=10_000): process(chunk)CSV read patterns that avoid pain

Writing CSV cleanly

# index=False is almost always what you want df.to_csv('output.csv', index=False) # Use UTF-8 (default but be explicit) df.to_csv('output.csv', encoding='utf-8', index=False) # Keep floats readable df.to_csv('output.csv', float_format='%.3f', index=False) # Compress for big files df.to_csv('output.csv.gz', compression='gzip', index=False)CSV write patterns

JSON: structured data

import json import pandas as pd # Simple read with open('data.json') as f: data = json.load(f) # Line-delimited JSON (common for big data) df = pd.read_json('data.jsonl', lines=True) # Deeply nested JSON into a flat table records = [ {'user': {'id': 1, 'name': 'Alice'}, 'events': [{'type': 'click'}]} ] flat = pd.json_normalize( records, record_path='events', meta=[['user', 'id'], ['user', 'name']] ) print(flat)Read and flatten JSON

Writing JSON

import json import numpy as np data = {'scores': [0.1, 0.2, 0.3], 'name': 'test'} # Pretty-printed, human-readable with open('out.json', 'w') as f: json.dump(data, f, indent=2, ensure_ascii=False) # numpy types crash default json; convert them class NPEncoder(json.JSONEncoder): def default(self, o): if isinstance(o, np.integer): return int(o) if isinstance(o, np.floating): return float(o) if isinstance(o, np.ndarray): return o.tolist() return super().default(o) with open('out.json', 'w') as f: json.dump(data, f, cls=NPEncoder, indent=2)Writing JSON with numpy support

The big idea: file formats have personalities. Matching format to use case saves time, and handling encoding/quoting edge cases is what separates fragile scripts from reliable pipelines.

End-of-lesson check

6 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-data-reading-writing-csv-json

What is the main idea of "Reading and Writing CSV and JSON in Python"?
1. These two formats are the bread and butter of data interchange. Handling them well means handling edge cases well.
2. Use AI as the final authority for the whole decision
3. Avoid checking the answer once it sounds polished
4. Focus only on speed instead of judgment
Which concept is most central to "Reading and Writing CSV and JSON in Python"?
1. JSON
2. CSV
3. file I/O
4. encoding
What should a careful learner remember about "When to prefer JSON over CSV"?
1. Use AI to draft or organize ideas about CSV, then verify before acting.
2. Skip the context so the tool can guess faster
3. Treat the output as private even after sharing it online
4. Use the answer without checking the source
You want to use AI after this lesson. What is the safest next step?
1. Act immediately because the AI answer is written clearly
2. Use AI for drafting and comparison, but verify before publishing or relying on it.
3. Hide uncertainty so the final answer looks cleaner
4. Use private or sensitive details before checking permission
How should AI output about CSV be treated?
1. As proof that no other source is needed
2. As a replacement for context, consent, or expert review
3. As a draft or helper output that still needs human judgment and verification
4. As something that becomes correct when it sounds confident
Name one way to verify an AI answer about CSV.

← Back to interactive lesson