Lesson 6 of 2116
Emergence, Capability Forecasting, and Safety
Emergent abilities make AI both more exciting and more dangerous. How do labs forecast what the next model will do — and what happens when they are wrong?
Lesson map
What this lesson covers
Learning path
The main moves in order
- 1The Forecasting Problem
- 2emergence
- 3capability forecasting
- 4red teaming
Concept cluster
Terms to connect while reading
Section 1
The Forecasting Problem
Investors, regulators, and safety teams all want to know: what will the next model be able to do? If abilities emerge in jumps, the question is harder than it sounds. You cannot measure a capability that does not exist yet.
Two views of emergence
Compare the options
| View | Argument |
|---|---|
| Real phenomenon | Abilities appear suddenly at scale thresholds |
| Measurement artifact | Smooth underlying progress, hidden by binary metrics |
| Likely both | Some abilities genuinely snap in, others are log-smooth |
The Schaeffer et al. (2023) paper argued that many reported emergent abilities disappear when you use continuous metrics. But follow-up work by other labs found residual sudden jumps even after metric smoothing. The honest answer is: it depends on the task.
Capability evaluations
- Static benchmarks: MMLU, GPQA, HumanEval — broad but easy to saturate
- Task-specific suites: SWE-Bench for coding, AIME for math, long-context retrieval
- Agentic evals: METR tasks, CyBench, SWE-Lancer for multi-step work
- Capability evaluations: probing specific dangerous skills like weapon synthesis reasoning
- Scaling evals: same benchmark re-run at multiple model sizes to interpolate
Responsible scaling policies
Frontier labs like Anthropic, OpenAI, and Google DeepMind publish policies committing to specific evaluations at capability thresholds. Anthropic's Responsible Scaling Policy defines AI Safety Levels (ASL) and requires mitigations proportional to the risk tier.
Red teaming and dangerous capabilities
- 1Pre-deployment evals target CBRN uplift, cyber offense, autonomous replication
- 2External red teams get early model access under NDA
- 3Pre-registered evals with hold-out data prevent training-time contamination
- 4Findings are documented and, increasingly, published
Capability forecasting techniques
- Fit power laws at smaller scales and extrapolate (often overconfident)
- Use proxy tasks correlated with the capability of interest
- Run prediction markets internally to aggregate researcher beliefs
- Simulate agents in structured environments and measure horizon
- Study mechanistic signals — does a circuit for the capability exist before behavior is robust?
“The scariest part of a capability evaluation is the tasks nobody remembered to include.”
Key terms in this lesson
The big idea: emergence makes forecasting genuinely hard. Responsible labs publish policies, run structured evals, and accept that surprise is a load-bearing assumption of the field.
End-of-lesson quiz
Check what stuck
15 questions · Score saves to your progress.
Tutor
Curious about “Emergence, Capability Forecasting, and Safety”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Creators · 45 min
Open vs. Closed Models: Philosophy and Strategy
Open-source AI is both a technical movement and a political one. Understand the arguments so you can pick a stack and defend it.
Creators · 55 min
The Three Ingredients: Data, Compute, Algorithms (Capstone)
Every AI breakthrough of the past decade rests on three interacting ingredients. Synthesize everything you have learned into one working model.
Creators · 35 min
How Chatbot Arena Works
The world's most influential 'leaderboard' for AI is not a test — it is humans voting blindly. Here is how that works.
