Iterative Amplification

Break a hard task into smaller subtasks. Solve each with an AI helper. Combine the answers. Repeat. That is iterative amplification, a blueprint for supervising things humans can't check alone.

34 min · Reviewed 2026

Humans Plus Helpers

Paul Christiano's 2018 framework begins with a thought experiment called HCH: a Human Consulting HCH. Imagine you can summon copies of yourself to answer small subquestions, and those copies can summon more copies. In the limit, a carefully-managed tree of humans answers questions none of them could answer alone.

From HCH to trained models

Start: a human who can solve only small tasks
Build: an AI distilled to imitate the human on those small tasks
Amplify: let the human decompose a bigger task into smaller ones, each solved by the AI
Distill again: train a new AI to imitate the amplified system
Repeat: each iteration handles slightly harder problems

Known difficulties

Distillation loss: the AI never perfectly imitates the human tree
Decomposition failure: some questions resist being split into small pieces
Coordination: subtask answers may conflict in hard-to-reconcile ways
Capability overhang: the distilled model might develop shortcuts the human tree did not

The core bet is that if I can break a hard task into tasks I can do, and I can align an AI to do each one, I have aligned an AI to do the whole thing.
— Paul Christiano, Alignment Research Center

The big idea: amplification treats alignment as a property that must survive a training loop, not a gate you pass once. That framing influences a lot of modern safety work.

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-safety2-iterative-amplification-creators

In the HCH framework (Human Consulting HCH), what does the second 'H' represent?
1. A hybrid human-AI team
2. A hierarchical system of computers
3. A copy of the human answering subquestions
4. A helper AI that answers questions
What is the primary purpose of the 'distillation' step in iterative amplification?
1. To train a new AI that imitates the amplified system
2. To compress the AI model to run on smaller devices
3. To break down tasks into smaller components
4. To verify that the human tree is aligned
According to the core idea presented, alignment in iterative amplification should be viewed as:
1. A requirement only for the final model
2. A gate that is passed once at deployment
3. A property that must survive the training loop
4. A feature added after capability is achieved
What does 'decomposition' refer to in the iterative amplification framework?
1. Reducing the training data to essential examples
2. Breaking a complex task into smaller subquestions humans can answer
3. Compressing the human tree structure
4. Splitting a large AI model into smaller models
Which of these is identified as a known difficulty in implementing iterative amplification?
1. Public resistance to AI
2. Excessive computational costs
3. Distillation loss
4. Lack of available training data
In the iterative amplification loop, what happens when alignment is preserved through each step?
1. The system becomes less capable over time
2. The final system remains aligned as capability grows
3. The system requires fewer iterations
4. The human oversight can be gradually removed
What is 'capability overhang' as described in the lesson?
1. When the system requires more computational resources
2. When AI capabilities grow faster than alignment techniques
3. When humans become dependent on AI helpers
4. When the distilled model develops shortcuts the human tree did not have
What risk arises when subtask answers from different parts of the human tree conflict with each other?
1. The conflict is resolved by removing subquestions
2. Reconciliation becomes difficult and may introduce errors
3. The AI automatically chooses the best answer
4. The system becomes more aligned due to debate
Which of these techniques is mentioned as borrowing from the ideas behind iterative amplification?
1. Reinforcement learning from human feedback
2. Constitutional AI
3. Unsupervised clustering
4. Supervised learning on large datasets
The lesson suggests that iterative amplification treats alignment as a property that must:
1. Survive the training loop
2. Be proven mathematically before deployment
3. Be added to the model after it works
4. Be tested at the end of training
In the iterative process, what happens during the 'amplify' step?
1. The training data is expanded
2. The human oversight is increased
3. The AI model becomes larger
4. A human decomposes a bigger task into smaller ones solved by the AI
If distillation loss is significant, what is the primary consequence for the amplified system?
1. The distilled model imperfectly imitates the human tree, potentially losing alignment
2. The training costs decrease
3. The human tree becomes more accurate
4. The system runs faster
What distinguishes chain-of-thought training from pure iterative amplification?
1. Chain-of-thought does not involve decomposition
2. Chain-of-thought guarantees perfect alignment
3. Chain-of-thought trains models to show their reasoning process
4. Chain-of-thought uses actual human trees
The lesson notes that no frontier model is trained with pure iterated amplification. What reason is suggested?
1. Governments have banned the approach
2. The framework is a guide, not a recipe
3. It has been proven to be unsafe
4. It requires too many human workers
What must be true for the final system in iterative amplification to remain aligned as it handles harder problems?
1. The system must never exceed human-level capability
2. Each step must add additional alignment safeguards
3. The distilled model must mirror the amplified human
4. The human must review every final output

← Back to interactive lesson

Tendril · Creators · Ethics & Society

Iterative Amplification

Break a hard task into smaller subtasks. Solve each with an AI helper. Combine the answers. Repeat. That is iterative amplification, a blueprint for supervising things humans can't check alone.

34 min · Reviewed 2026

Humans Plus Helpers

From HCH to trained models

Start: a human who can solve only small tasks
Build: an AI distilled to imitate the human on those small tasks
Amplify: let the human decompose a bigger task into smaller ones, each solved by the AI
Distill again: train a new AI to imitate the amplified system
Repeat: each iteration handles slightly harder problems

Known difficulties

Distillation loss: the AI never perfectly imitates the human tree
Decomposition failure: some questions resist being split into small pieces
Coordination: subtask answers may conflict in hard-to-reconcile ways
Capability overhang: the distilled model might develop shortcuts the human tree did not

The core bet is that if I can break a hard task into tasks I can do, and I can align an AI to do each one, I have aligned an AI to do the whole thing.
— Paul Christiano, Alignment Research Center

The big idea: amplification treats alignment as a property that must survive a training loop, not a gate you pass once. That framing influences a lot of modern safety work.

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-safety2-iterative-amplification-creators

In the HCH framework (Human Consulting HCH), what does the second 'H' represent?
1. A hybrid human-AI team
2. A hierarchical system of computers
3. A copy of the human answering subquestions
4. A helper AI that answers questions
What is the primary purpose of the 'distillation' step in iterative amplification?
1. To train a new AI that imitates the amplified system
2. To compress the AI model to run on smaller devices
3. To break down tasks into smaller components
4. To verify that the human tree is aligned
According to the core idea presented, alignment in iterative amplification should be viewed as:
1. A requirement only for the final model
2. A gate that is passed once at deployment
3. A property that must survive the training loop
4. A feature added after capability is achieved
What does 'decomposition' refer to in the iterative amplification framework?
1. Reducing the training data to essential examples
2. Breaking a complex task into smaller subquestions humans can answer
3. Compressing the human tree structure
4. Splitting a large AI model into smaller models
Which of these is identified as a known difficulty in implementing iterative amplification?
1. Public resistance to AI
2. Excessive computational costs
3. Distillation loss
4. Lack of available training data
In the iterative amplification loop, what happens when alignment is preserved through each step?
1. The system becomes less capable over time
2. The final system remains aligned as capability grows
3. The system requires fewer iterations
4. The human oversight can be gradually removed
What is 'capability overhang' as described in the lesson?
1. When the system requires more computational resources
2. When AI capabilities grow faster than alignment techniques
3. When humans become dependent on AI helpers
4. When the distilled model develops shortcuts the human tree did not have
What risk arises when subtask answers from different parts of the human tree conflict with each other?
1. The conflict is resolved by removing subquestions
2. Reconciliation becomes difficult and may introduce errors
3. The AI automatically chooses the best answer
4. The system becomes more aligned due to debate
Which of these techniques is mentioned as borrowing from the ideas behind iterative amplification?
1. Reinforcement learning from human feedback
2. Constitutional AI
3. Unsupervised clustering
4. Supervised learning on large datasets
The lesson suggests that iterative amplification treats alignment as a property that must:
1. Survive the training loop
2. Be proven mathematically before deployment
3. Be added to the model after it works
4. Be tested at the end of training
In the iterative process, what happens during the 'amplify' step?
1. The training data is expanded
2. The human oversight is increased
3. The AI model becomes larger
4. A human decomposes a bigger task into smaller ones solved by the AI
If distillation loss is significant, what is the primary consequence for the amplified system?
1. The distilled model imperfectly imitates the human tree, potentially losing alignment
2. The training costs decrease
3. The human tree becomes more accurate
4. The system runs faster
What distinguishes chain-of-thought training from pure iterative amplification?
1. Chain-of-thought does not involve decomposition
2. Chain-of-thought guarantees perfect alignment
3. Chain-of-thought trains models to show their reasoning process
4. Chain-of-thought uses actual human trees
The lesson notes that no frontier model is trained with pure iterated amplification. What reason is suggested?
1. Governments have banned the approach
2. The framework is a guide, not a recipe
3. It has been proven to be unsafe
4. It requires too many human workers
What must be true for the final system in iterative amplification to remain aligned as it handles harder problems?
1. The system must never exceed human-level capability
2. Each step must add additional alignment safeguards
3. The distilled model must mirror the amplified human
4. The human must review every final output

← Back to interactive lesson