Loading lesson…
What if you have to supervise a student smarter than you? OpenAI's 2023 paper asked that question by using GPT-2 to train GPT-4. The results were surprising.
Human teachers will eventually be the weak party. If we want to keep supervising models that are better than us at most tasks, we need to know: does a stronger model, trained on weaker labels, stay stuck at the weak supervisor's ceiling, or can it generalize past it?
In December 2023, OpenAI published Weak-to-Strong Generalization. They simulated the future problem today by using GPT-2 as the weak supervisor and GPT-4 as the strong student. GPT-2 generated labels, sometimes wrong ones, and GPT-4 was fine-tuned on them.
Weak-to-strong is not the answer. It is evidence that the shape of the answer might exist.
— Collin Burns, paper co-author, interview (paraphrased)
The big idea: smarter students might partially teach themselves from weaker teachers. That is encouraging but far from sufficient for the real superhuman case.
15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-safety2-weak-to-strong-creators
What is the core idea behind "Weak-to-Strong Generalization"?
Which term best describes a foundational idea in "Weak-to-Strong Generalization"?
A learner studying Weak-to-Strong Generalization would need to understand which concept?
Which of these is directly relevant to Weak-to-Strong Generalization?
Which of the following is a key point about Weak-to-Strong Generalization?
Which of these does NOT belong in a discussion of Weak-to-Strong Generalization?
Which statement is accurate regarding Weak-to-Strong Generalization?
Which of these does NOT belong in a discussion of Weak-to-Strong Generalization?
What is the key insight about "Why this is a clue" in the context of Weak-to-Strong Generalization?
What is the key insight about "The superalignment question" in the context of Weak-to-Strong Generalization?
Which statement accurately describes an aspect of Weak-to-Strong Generalization?
What does working with Weak-to-Strong Generalization typically involve?
Which of the following is true about Weak-to-Strong Generalization?
Which best describes the scope of "Weak-to-Strong Generalization"?
Which section heading best belongs in a lesson about Weak-to-Strong Generalization?