Tendril

Lesson 176 of 1570

Scalable Oversight: Watching Models Smarter Than You

When AI outputs get too long, too technical, or too fast for humans to check, how do you know it is doing the right thing? Scalable oversight is the research program trying to answer that.

BuildersEthics & Society~17 min readIntermediateResearcherBI5 · Societal ImpactBI3 · LearningPrint / PDF

Lesson map

What this lesson covers

28 min13 blocks3 concepts

Learning path

The main moves in order

1The Bandwidth Problem
2scalable oversight
3RLHF
4human feedback

Concept cluster

Terms to connect while reading

scalable oversightRLHFhuman feedback

Sections3

Lists2

Notes4

Quotes1

Terms1

Section 1

The Bandwidth Problem

Human feedback is the backbone of modern alignment. Raters read a model answer and upvote or downvote. That works great when the answer is short and the rater is qualified. It breaks when the answer is a 30-page research paper, a 10,000-line codebase, or a claim in a field no rater actually knows.

Why it matters

Models are outpacing the speed at which humans can read their work
Domains like biology and math exceed most raters' expertise
Rater fatigue causes quality to drop across long sessions
If the model is wrong in ways the rater can't see, training reinforces the error

Check-in 1. Got it so far?

Main approaches

1Debate: two models argue, a human judges
2Iterated amplification: break hard tasks into smaller pieces a human can check
3Recursive reward modeling: train a reward model on easier subtasks, use it to evaluate harder ones
4Process supervision: score the reasoning steps, not just the final answer
5Critique models: one model finds flaws in another's output

“The hope is that we can use AI to help us align smarter AI, bootstrapping supervision up the capability curve.”
Jan Leike, formerly OpenAI superalignment

Check-in 2. Got it so far?

Key terms in this lesson

The big idea: alignment at scale is not just better labels. It is a research bet that supervision itself can be amplified without losing the human anchor.

Check-in 3. Got it so far?

End-of-lesson quiz

Check what stuck

15 questions · Score saves to your progress.

Tutor

Curious about “Scalable Oversight: Watching Models Smarter Than You”?

Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.

Progress saved locally in this browser. Sign in to sync across devices.

Related lessons

Scalable Oversight: Watching Models Smarter Than You

The Bandwidth Problem

Why it matters

Main approaches

Curious about “Scalable Oversight: Watching Models Smarter Than You”?

Keep going

Scalable Oversight: Watching Models Smarter Than You

The Bandwidth Problem

Why it matters

Main approaches

Curious about “Scalable Oversight: Watching Models Smarter Than You”?

Keep going