Anonymization and Why It Often Fails

Section 1

The Illusion of Anonymity

Compare the options

Technique	Strength	Weakness
Pseudonymization	Simple	Weakest, easily reversed
k-anonymity	Every record shares attributes with k-1 others	Vulnerable to homogeneity attacks
l-diversity	Adds variety in sensitive fields	Fails against skewed distributions
t-closeness	Sensitive distributions match the full dataset	Reduces utility substantially
Differential privacy	Mathematical guarantee	Adds noise, reduces accuracy

A one-line Laplace-mechanism example

python

import numpy as np

def dp_count(data, true_count, epsilon=1.0):
    # Laplace noise scaled by sensitivity (1 for counts) / epsilon
    noise = np.random.laplace(loc=0, scale=1/epsilon)
    return true_count + noise

# 1000 people have condition X
# DP releases a noisy count that still tells useful stats
# but hides any single individual
noisy = dp_count(data=None, true_count=1000, epsilon=1.0)
print(f'Reported count: {noisy:.0f}')

Key terms in this lesson

Anonymization and Why It Often Fails

The Illusion of Anonymity

The Sweeney result

Why naive anonymization fails

Formal techniques

Differential privacy: the gold standard

Practical guidance

Curious about “Anonymization and Why It Often Fails”?

Keep going

Anonymization and Why It Often Fails

The Illusion of Anonymity

The Sweeney result

Why naive anonymization fails

Formal techniques

Differential privacy: the gold standard

Practical guidance

Curious about “Anonymization and Why It Often Fails”?

Keep going