neural-forge.io

Sign inStartStart learning

Tendril

AI Foundations0%

Lesson 6 of 2116

Emergence, Capability Forecasting, and Safety

Emergent abilities make AI both more exciting and more dangerous. How do labs forecast what the next model will do — and what happens when they are wrong?

CreatorsAI Foundations~27 min readAdvancedProfessionalBI3 · LearningBI5 · Societal ImpactPrint / PDF

Lesson map

What this lesson covers

45 min20 blocks4 concepts

Learning path

The main moves in order

1The Forecasting Problem
2emergence
3capability forecasting
4red teaming

Concept cluster

Terms to connect while reading

emergencecapability forecastingred teamingdangerous capabilities

Read4

Sections6

Lists3

Notes4

Compare1

Quotes1

Section 1

The Forecasting Problem

Investors, regulators, and safety teams all want to know: what will the next model be able to do? If abilities emerge in jumps, the question is harder than it sounds. You cannot measure a capability that does not exist yet.

Two views of emergence

Compare the options

View	Argument
Real phenomenon	Abilities appear suddenly at scale thresholds
Measurement artifact	Smooth underlying progress, hidden by binary metrics
Likely both	Some abilities genuinely snap in, others are log-smooth

The Schaeffer et al. (2023) paper argued that many reported emergent abilities disappear when you use continuous metrics. But follow-up work by other labs found residual sudden jumps even after metric smoothing. The honest answer is: it depends on the task.

Check-in 1. Got it so far?

Capability evaluations

Static benchmarks: MMLU, GPQA, HumanEval — broad but easy to saturate
Task-specific suites: SWE-Bench for coding, AIME for math, long-context retrieval
Agentic evals: METR tasks, CyBench, SWE-Lancer for multi-step work
Capability evaluations: probing specific dangerous skills like weapon synthesis reasoning
Scaling evals: same benchmark re-run at multiple model sizes to interpolate

Responsible scaling policies

Frontier labs like Anthropic, OpenAI, and Google DeepMind publish policies committing to specific evaluations at capability thresholds. Anthropic's Responsible Scaling Policy defines AI Safety Levels (ASL) and requires mitigations proportional to the risk tier.

Check-in 2. Got it so far?

Red teaming and dangerous capabilities

1Pre-deployment evals target CBRN uplift, cyber offense, autonomous replication
2External red teams get early model access under NDA
3Pre-registered evals with hold-out data prevent training-time contamination
4Findings are documented and, increasingly, published

Capability forecasting techniques

Fit power laws at smaller scales and extrapolate (often overconfident)
Use proxy tasks correlated with the capability of interest
Run prediction markets internally to aggregate researcher beliefs
Simulate agents in structured environments and measure horizon
Study mechanistic signals — does a circuit for the capability exist before behavior is robust?

Check-in 3. Got it so far?

“The scariest part of a capability evaluation is the tasks nobody remembered to include.”
A safety researcher

Key terms in this lesson

The big idea: emergence makes forecasting genuinely hard. Responsible labs publish policies, run structured evals, and accept that surprise is a load-bearing assumption of the field.

Check-in 4. Got it so far?

End-of-lesson quiz

Check what stuck

15 questions · Score saves to your progress.

Tutor

Curious about “Emergence, Capability Forecasting, and Safety”?

Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.

Progress saved locally in this browser. Sign in to sync across devices.

Related lessons

Keep going