neural-forge.io

Sign inStartStart learning

Tendril

Ethics & Society0%

Lesson 221 of 2116

Model Extraction and Distillation Attacks

If you query a closed model enough, you can sometimes reconstruct it. Here is the research on extraction attacks and what it means for proprietary AI.

CreatorsEthics & Society~21 min readAdvancedResearcherBI5 · Societal ImpactBI3 · LearningPrint / PDF

Lesson map

What this lesson covers

35 min19 blocks3 concepts

Learning path

The main moves in order

1Stealing a Model You Can Only Talk To
2model extraction
3distillation attack
4query budget

Concept cluster

Terms to connect while reading

model extractiondistillation attackquery budget

Read2

Sections6

Lists4

Notes4

Compare1

Quotes1

Section 1

Stealing a Model You Can Only Talk To

Imagine a closed API: you send prompts, you get outputs. The model's weights are secret. Can you reconstruct enough of it to build a useful clone? This is model extraction, and the answer for many real scenarios is yes, partially, and for a price that is often surprisingly low.

The basic attack

1Collect a diverse set of input prompts (use a large language corpus).
2Query the target model; log (prompt, output) pairs.
3Train a student model on those pairs via standard fine-tuning or knowledge distillation.
4The student approximates the teacher's behavior on the queried distribution.
5If the student matches well enough on downstream tasks, the attack succeeded.

Real results

Tramèr et al. 2016: stole decision-tree-equivalents of ML-as-a-service models for under $10
Orekondy et al. 2019: Knockoff Nets extracted image classifiers from commercial APIs
Krishna et al. 2020: extracted BERT-class models for natural language
Carlini, Jagielski et al. 2024: Stealing Part of a Production Language Model recovered the last-layer embedding matrix of several OpenAI and Google models via API queries
Various 2023-2024 papers: fine-tuning smaller open models on outputs of GPT-4 produced surprisingly capable students

Check-in 1. Got it so far?

What extraction costs

Compare the options

Target	Approximate cost (USD)	Quality of copy
Small classification API	$10-1,000	Often near-parity
Mid-size chat model via outputs	$1,000-100,000	Useful downstream model, not identical
Frontier chat model (full match)	Not currently feasible	Practical distillation gets you ~70-90% on benchmarks
Last-layer parameters	$100-$1,000 (when API allows)	Exact on queried dimensions

Why this matters legally and strategically

Distillation may violate API terms of service (most now explicitly prohibit training competing models on outputs)
Open-model landscape includes many models whose quality traces to GPT-4-class teachers
Export control on a closed model is limited by extractability
Defensive watermarking of outputs can help prove a downstream model was distilled
Lab competition dynamics shift when the leader can be approximated cheaply

Check-in 2. Got it so far?

Defenses

Rate limiting: caps on queries per account
Monitoring for distributional probing patterns
Output perturbation: add small noise to logits at the cost of quality
Watermarking: embed a signal in outputs that a distilled model would inherit
Legal: terms of service, lawsuits (OpenAI has pursued several alleged distillation cases)

“When a model is exposed via API, you should assume that a lower-fidelity copy of it exists somewhere you do not control.”
Nicholas Carlini, Google DeepMind

Check-in 3. Got it so far?

Key terms in this lesson

The big idea: APIs leak. Extraction is not always cheap, but for most intermediate-quality models it is cheaper than people think. The field is adjusting, and so are the contracts, but the underlying game is adversarial and ongoing.

Check-in 4. Got it so far?

End-of-lesson quiz

Check what stuck

15 questions · Score saves to your progress.

Tutor

Curious about “Model Extraction and Distillation Attacks”?

Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.

Progress saved locally in this browser. Sign in to sync across devices.

Related lessons

Keep going