Loading lesson…
Break a hard task into smaller subtasks. Solve each with an AI helper. Combine the answers. Repeat. That is iterative amplification, a blueprint for supervising things humans can't check alone.
Paul Christiano's 2018 framework begins with a thought experiment called HCH: a Human Consulting HCH. Imagine you can summon copies of yourself to answer small subquestions, and those copies can summon more copies. In the limit, a carefully-managed tree of humans answers questions none of them could answer alone.
The core bet is that if I can break a hard task into tasks I can do, and I can align an AI to do each one, I have aligned an AI to do the whole thing.
— Paul Christiano, Alignment Research Center
The big idea: amplification treats alignment as a property that must survive a training loop, not a gate you pass once. That framing influences a lot of modern safety work.
15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-safety2-iterative-amplification-creators
In the HCH framework (Human Consulting HCH), what does the second 'H' represent?
What is the primary purpose of the 'distillation' step in iterative amplification?
According to the core idea presented, alignment in iterative amplification should be viewed as:
What does 'decomposition' refer to in the iterative amplification framework?
Which of these is identified as a known difficulty in implementing iterative amplification?
In the iterative amplification loop, what happens when alignment is preserved through each step?
What is 'capability overhang' as described in the lesson?
What risk arises when subtask answers from different parts of the human tree conflict with each other?
Which of these techniques is mentioned as borrowing from the ideas behind iterative amplification?
The lesson suggests that iterative amplification treats alignment as a property that must:
In the iterative process, what happens during the 'amplify' step?
If distillation loss is significant, what is the primary consequence for the amplified system?
What distinguishes chain-of-thought training from pure iterative amplification?
The lesson notes that no frontier model is trained with pure iterated amplification. What reason is suggested?
What must be true for the final system in iterative amplification to remain aligned as it handles harder problems?