Lesson 854 of 2116
Iterative Amplification
Break a hard task into smaller subtasks. Solve each with an AI helper. Combine the answers. Repeat. That is iterative amplification, a blueprint for supervising things humans can't check alone.
Lesson map
What this lesson covers
Learning path
The main moves in order
- 1Humans Plus Helpers
- 2iterated amplification
- 3HCH
- 4decomposition
Concept cluster
Terms to connect while reading
Section 1
Humans Plus Helpers
Paul Christiano's 2018 framework begins with a thought experiment called HCH: a Human Consulting HCH. Imagine you can summon copies of yourself to answer small subquestions, and those copies can summon more copies. In the limit, a carefully-managed tree of humans answers questions none of them could answer alone.
From HCH to trained models
- 1Start: a human who can solve only small tasks
- 2Build: an AI distilled to imitate the human on those small tasks
- 3Amplify: let the human decompose a bigger task into smaller ones, each solved by the AI
- 4Distill again: train a new AI to imitate the amplified system
- 5Repeat: each iteration handles slightly harder problems
Known difficulties
- Distillation loss: the AI never perfectly imitates the human tree
- Decomposition failure: some questions resist being split into small pieces
- Coordination: subtask answers may conflict in hard-to-reconcile ways
- Capability overhang: the distilled model might develop shortcuts the human tree did not
“The core bet is that if I can break a hard task into tasks I can do, and I can align an AI to do each one, I have aligned an AI to do the whole thing.”
Key terms in this lesson
The big idea: amplification treats alignment as a property that must survive a training loop, not a gate you pass once. That framing influences a lot of modern safety work.
End-of-lesson quiz
Check what stuck
15 questions · Score saves to your progress.
Tutor
Curious about “Iterative Amplification”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Creators · 50 min
AI Alignment: The Actual Technical Problem
Alignment is not a vibes debate. It is a concrete technical problem about getting systems to pursue goals we actually want. Here is what researchers work on when they say they work on alignment.
Creators · 40 min
Jailbreak Case Studies: What Actually Broke
Abstract jailbreak theory is less useful than real cases. Here are the techniques that worked on production models, what they taught us, and what is still unsolved.
Creators · 55 min
Alignment: The Full Technical Picture
What alignment actually is as a research program, how it is done in practice, what the open problems are, and where the actual papers live. A model that is always helpful will help you do harmful things.
