Emergence, Capability Forecasting, and Safety

Emergent abilities make AI both more exciting and more dangerous. How do labs forecast what the next model will do — and what happens when they are wrong?

45 min · Reviewed 2026

The Forecasting Problem

Investors, regulators, and safety teams all want to know: what will the next model be able to do? If abilities emerge in jumps, the question is harder than it sounds. You cannot measure a capability that does not exist yet.

Two views of emergence

View	Argument
Real phenomenon	Abilities appear suddenly at scale thresholds
Measurement artifact	Smooth underlying progress, hidden by binary metrics
Likely both	Some abilities genuinely snap in, others are log-smooth

The Schaeffer et al. (2023) paper argued that many reported emergent abilities disappear when you use continuous metrics. But follow-up work by other labs found residual sudden jumps even after metric smoothing. The honest answer is: it depends on the task.

Capability evaluations

Static benchmarks: MMLU, GPQA, HumanEval — broad but easy to saturate
Task-specific suites: SWE-Bench for coding, AIME for math, long-context retrieval
Agentic evals: METR tasks, CyBench, SWE-Lancer for multi-step work
Capability evaluations: probing specific dangerous skills like weapon synthesis reasoning
Scaling evals: same benchmark re-run at multiple model sizes to interpolate

Responsible scaling policies

Frontier labs like Anthropic, OpenAI, and Google DeepMind publish policies committing to specific evaluations at capability thresholds. Anthropic's Responsible Scaling Policy defines AI Safety Levels (ASL) and requires mitigations proportional to the risk tier.

Red teaming and dangerous capabilities

Pre-deployment evals target CBRN uplift, cyber offense, autonomous replication
External red teams get early model access under NDA
Pre-registered evals with hold-out data prevent training-time contamination
Findings are documented and, increasingly, published

Capability forecasting techniques

Fit power laws at smaller scales and extrapolate (often overconfident)
Use proxy tasks correlated with the capability of interest
Run prediction markets internally to aggregate researcher beliefs
Simulate agents in structured environments and measure horizon
Study mechanistic signals — does a circuit for the capability exist before behavior is robust?

The scariest part of a capability evaluation is the tasks nobody remembered to include.
— A safety researcher

The big idea: emergence makes forecasting genuinely hard. Responsible labs publish policies, run structured evals, and accept that surprise is a load-bearing assumption of the field.

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-creators-emergence-and-capability-forecasting

What is the core idea behind "Emergence, Capability Forecasting, and Safety"?
1. Emergent abilities make AI both more exciting and more dangerous. How do labs forecast what the next model will do — and what happens when they are wrong?
2. Compare contiguous KV cache fragmentation to paged allocation under varied reque…
3. compute optimal
4. factuality
Which term best describes a foundational idea in "Emergence, Capability Forecasting, and Safety"?
1. ASL
2. emergence
3. red team
4. capability eval
A learner studying Emergence, Capability Forecasting, and Safety would need to understand which concept?
1. emergence
2. red team
3. ASL
4. capability eval
Which of these is directly relevant to Emergence, Capability Forecasting, and Safety?
1. emergence
2. ASL
3. capability eval
4. red team
Which of the following is a key point about Emergence, Capability Forecasting, and Safety?
1. Static benchmarks: MMLU, GPQA, HumanEval — broad but easy to saturate
2. Task-specific suites: SWE-Bench for coding, AIME for math, long-context retrieval
3. Agentic evals: METR tasks, CyBench, SWE-Lancer for multi-step work
4. Capability evaluations: probing specific dangerous skills like weapon synthesis reasoning
Which of these does NOT belong in a discussion of Emergence, Capability Forecasting, and Safety?
1. Task-specific suites: SWE-Bench for coding, AIME for math, long-context retrieval
2. Compare contiguous KV cache fragmentation to paged allocation under varied reque…
3. Agentic evals: METR tasks, CyBench, SWE-Lancer for multi-step work
4. Static benchmarks: MMLU, GPQA, HumanEval — broad but easy to saturate
Which statement is accurate regarding Emergence, Capability Forecasting, and Safety?
1. External red teams get early model access under NDA
2. Pre-registered evals with hold-out data prevent training-time contamination
3. Pre-deployment evals target CBRN uplift, cyber offense, autonomous replication
4. Findings are documented and, increasingly, published
Which of these does NOT belong in a discussion of Emergence, Capability Forecasting, and Safety?
1. Pre-registered evals with hold-out data prevent training-time contamination
2. Compare contiguous KV cache fragmentation to paged allocation under varied reque…
3. External red teams get early model access under NDA
4. Pre-deployment evals target CBRN uplift, cyber offense, autonomous replication
What is the key insight about "ASL in brief" in the context of Emergence, Capability Forecasting, and Safety?
1. ASL-1 and ASL-2 cover current models with standard safeguards.
2. Compare contiguous KV cache fragmentation to paged allocation under varied reque…
3. compute optimal
4. factuality
What is the key insight about "Unknown unknowns remain the hard problem" in the context of Emergence, Capability Forecasting, and Safety?
1. Compare contiguous KV cache fragmentation to paged allocation under varied reque…
2. You can test for capabilities you imagine. The ones you do not imagine are the ones that have historically caught labs o…
3. compute optimal
4. factuality
What is the recommended tip about "Ground your practice in fundamentals" in the context of Emergence, Capability Forecasting, and Safety?
1. Compare contiguous KV cache fragmentation to paged allocation under varied reque…
2. compute optimal
3. Every AI capability has an underlying mechanism. Understanding that mechanism tells you where it'll fail — which is more…
4. factuality
Which statement accurately describes an aspect of Emergence, Capability Forecasting, and Safety?
1. Compare contiguous KV cache fragmentation to paged allocation under varied reque…
2. compute optimal
3. factuality
4. Investors, regulators, and safety teams all want to know: what will the next model be able to do? If abilities emerge in jumps, the question…
What does working with Emergence, Capability Forecasting, and Safety typically involve?
1. The Schaeffer et al. (2023) paper argued that many reported emergent abilities disappear when you use continuous metrics.
2. Compare contiguous KV cache fragmentation to paged allocation under varied reque…
3. compute optimal
4. factuality
Which of the following is true about Emergence, Capability Forecasting, and Safety?
1. Compare contiguous KV cache fragmentation to paged allocation under varied reque…
2. Frontier labs like Anthropic, OpenAI, and Google DeepMind publish policies committing to specific evaluations at capability thresholds.
3. compute optimal
4. factuality
Which best describes the scope of "Emergence, Capability Forecasting, and Safety"?
1. It is unrelated to foundations workflows
2. It applies only to the opposite beginner tier
3. It focuses on Emergent abilities make AI both more exciting and more dangerous. How do labs forecast what the next
4. It was deprecated in 2024 and no longer relevant

← Back to interactive lesson

Tendril · Creators · AI Foundations

Emergence, Capability Forecasting, and Safety

Emergent abilities make AI both more exciting and more dangerous. How do labs forecast what the next model will do — and what happens when they are wrong?

45 min · Reviewed 2026

The Forecasting Problem

Two views of emergence

View	Argument
Real phenomenon	Abilities appear suddenly at scale thresholds
Measurement artifact	Smooth underlying progress, hidden by binary metrics
Likely both	Some abilities genuinely snap in, others are log-smooth

Capability evaluations

Static benchmarks: MMLU, GPQA, HumanEval — broad but easy to saturate
Task-specific suites: SWE-Bench for coding, AIME for math, long-context retrieval
Agentic evals: METR tasks, CyBench, SWE-Lancer for multi-step work
Capability evaluations: probing specific dangerous skills like weapon synthesis reasoning
Scaling evals: same benchmark re-run at multiple model sizes to interpolate

Responsible scaling policies

Red teaming and dangerous capabilities

Pre-deployment evals target CBRN uplift, cyber offense, autonomous replication
External red teams get early model access under NDA
Pre-registered evals with hold-out data prevent training-time contamination
Findings are documented and, increasingly, published

Capability forecasting techniques

Fit power laws at smaller scales and extrapolate (often overconfident)
Use proxy tasks correlated with the capability of interest
Run prediction markets internally to aggregate researcher beliefs
Simulate agents in structured environments and measure horizon
Study mechanistic signals — does a circuit for the capability exist before behavior is robust?

The scariest part of a capability evaluation is the tasks nobody remembered to include.
— A safety researcher

The big idea: emergence makes forecasting genuinely hard. Responsible labs publish policies, run structured evals, and accept that surprise is a load-bearing assumption of the field.

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-creators-emergence-and-capability-forecasting

What is the core idea behind "Emergence, Capability Forecasting, and Safety"?
1. Emergent abilities make AI both more exciting and more dangerous. How do labs forecast what the next model will do — and what happens when they are wrong?
2. Compare contiguous KV cache fragmentation to paged allocation under varied reque…
3. compute optimal
4. factuality
Which term best describes a foundational idea in "Emergence, Capability Forecasting, and Safety"?
1. ASL
2. emergence
3. red team
4. capability eval
A learner studying Emergence, Capability Forecasting, and Safety would need to understand which concept?
1. emergence
2. red team
3. ASL
4. capability eval
Which of these is directly relevant to Emergence, Capability Forecasting, and Safety?
1. emergence
2. ASL
3. capability eval
4. red team
Which of the following is a key point about Emergence, Capability Forecasting, and Safety?
1. Static benchmarks: MMLU, GPQA, HumanEval — broad but easy to saturate
2. Task-specific suites: SWE-Bench for coding, AIME for math, long-context retrieval
3. Agentic evals: METR tasks, CyBench, SWE-Lancer for multi-step work
4. Capability evaluations: probing specific dangerous skills like weapon synthesis reasoning
Which of these does NOT belong in a discussion of Emergence, Capability Forecasting, and Safety?
1. Task-specific suites: SWE-Bench for coding, AIME for math, long-context retrieval
2. Compare contiguous KV cache fragmentation to paged allocation under varied reque…
3. Agentic evals: METR tasks, CyBench, SWE-Lancer for multi-step work
4. Static benchmarks: MMLU, GPQA, HumanEval — broad but easy to saturate
Which statement is accurate regarding Emergence, Capability Forecasting, and Safety?
1. External red teams get early model access under NDA
2. Pre-registered evals with hold-out data prevent training-time contamination
3. Pre-deployment evals target CBRN uplift, cyber offense, autonomous replication
4. Findings are documented and, increasingly, published
Which of these does NOT belong in a discussion of Emergence, Capability Forecasting, and Safety?
1. Pre-registered evals with hold-out data prevent training-time contamination
2. Compare contiguous KV cache fragmentation to paged allocation under varied reque…
3. External red teams get early model access under NDA
4. Pre-deployment evals target CBRN uplift, cyber offense, autonomous replication
What is the key insight about "ASL in brief" in the context of Emergence, Capability Forecasting, and Safety?
1. ASL-1 and ASL-2 cover current models with standard safeguards.
2. Compare contiguous KV cache fragmentation to paged allocation under varied reque…
3. compute optimal
4. factuality
What is the key insight about "Unknown unknowns remain the hard problem" in the context of Emergence, Capability Forecasting, and Safety?
1. Compare contiguous KV cache fragmentation to paged allocation under varied reque…
2. You can test for capabilities you imagine. The ones you do not imagine are the ones that have historically caught labs o…
3. compute optimal
4. factuality
What is the recommended tip about "Ground your practice in fundamentals" in the context of Emergence, Capability Forecasting, and Safety?
1. Compare contiguous KV cache fragmentation to paged allocation under varied reque…
2. compute optimal
3. Every AI capability has an underlying mechanism. Understanding that mechanism tells you where it'll fail — which is more…
4. factuality
Which statement accurately describes an aspect of Emergence, Capability Forecasting, and Safety?
1. Compare contiguous KV cache fragmentation to paged allocation under varied reque…
2. compute optimal
3. factuality
4. Investors, regulators, and safety teams all want to know: what will the next model be able to do? If abilities emerge in jumps, the question…
What does working with Emergence, Capability Forecasting, and Safety typically involve?
1. The Schaeffer et al. (2023) paper argued that many reported emergent abilities disappear when you use continuous metrics.
2. Compare contiguous KV cache fragmentation to paged allocation under varied reque…
3. compute optimal
4. factuality
Which of the following is true about Emergence, Capability Forecasting, and Safety?
1. Compare contiguous KV cache fragmentation to paged allocation under varied reque…
2. Frontier labs like Anthropic, OpenAI, and Google DeepMind publish policies committing to specific evaluations at capability thresholds.
3. compute optimal
4. factuality
Which best describes the scope of "Emergence, Capability Forecasting, and Safety"?
1. It is unrelated to foundations workflows
2. It applies only to the opposite beginner tier
3. It focuses on Emergent abilities make AI both more exciting and more dangerous. How do labs forecast what the next
4. It was deprecated in 2024 and no longer relevant

← Back to interactive lesson