Lesson 27 of 2116
Responsible Scaling Policies Explained
RSPs are the frontier labs' self-imposed rules for what capability thresholds trigger which safeguards. Here is what they commit to, what they hedge on, and what the enforcement problem is.
Lesson map
What this lesson covers
Learning path
The main moves in order
- 1A Self-Imposed Brake
- 2Responsible Scaling Policy
- 3ASL
- 4Preparedness Framework
Concept cluster
Terms to connect while reading
Section 1
A Self-Imposed Brake
A Responsible Scaling Policy (RSP) is a frontier lab's public commitment to pause or add safeguards when models cross defined capability thresholds. Anthropic introduced the concept in September 2023. OpenAI published its analog (Preparedness Framework) in December 2023. Google DeepMind followed with its Frontier Safety Framework in May 2024. Meta, xAI, and Cohere all have some version now.
The Anthropic RSP architecture (v2.2, 2025)
Anthropic's RSP defines AI Safety Levels (ASL) modeled on biosafety levels (BSL) for pathogens. Each level specifies capabilities and required safeguards.
- ASL-1: No meaningful catastrophic risk (2018-era models, narrow chess AIs)
- ASL-2: Early signs of dangerous capability but unreliable (most current LLMs)
- ASL-3: Substantial uplift on CBRN misuse or meaningful autonomous AI R&D. Triggered May 2025 for Anthropic's frontier models. Requires stronger weight security and deployment restrictions.
- ASL-4: Not yet defined concretely; associated with qualitative escalations in autonomy and misuse potential
- ASL-5: Placeholder for far-future concerns
What capabilities actually trigger the next level?
- 1CBRN uplift: can the model meaningfully help a novice with a bioweapon?
- 2Cyber uplift: does it enable attacks that were previously infeasible?
- 3Autonomous replication: can it copy itself across machines without human help?
- 4AI R&D acceleration: can it fully automate junior-researcher AI work?
- 5Persuasion: does it meaningfully exceed human baselines in influence operations?
OpenAI's Preparedness Framework
OpenAI rates models across four risk categories (cybersecurity, CBRN, persuasion, model autonomy) on a low/medium/high/critical scale. Models scoring high can be deployed only with mitigations; models scoring critical cannot be trained further until mitigations work. Pre-mitigation score matters — mitigating a critical model back to medium does not let you train the next one without reviewing.
Compare: the three big-lab frameworks
Compare the options
| Dimension | Anthropic RSP | OpenAI Preparedness | DeepMind FSF |
|---|---|---|---|
| Structure | Levels (ASL-1..5) | Categories × severity | Critical Capability Levels |
| Triggered by | Capability thresholds | Pre-mitigation scores | Evaluation outcomes |
| Can pause training? | Yes | Yes (critical) | Yes |
| External input | Board, policy team | Safety Advisory Group | Internal review |
| Weight security level | Escalates with ASL | Escalates with severity | Escalates |
What RSPs get right
- Specificity: concrete capability thresholds, not vague commitments
- Pre-commitment: easier to honor restrictions announced before a race
- Safeguard ladders: security and deployment measures scale with capability
- Publicness: the document can be evaluated and critiqued by outsiders
- Interoperability: Seoul Summit Commitments use RSP-style language
What critics hammer
- Self-imposed: no external enforcement, can be amended by the company at will
- Revisable mid-race: Anthropic moved some thresholds between RSP versions
- No peer consistency: what triggers ASL-3 at Anthropic is not what triggers high at OpenAI
- Eval quality: thresholds are only as good as the evaluations probing them — and evals are still immature
- Incentive problem: the company running the evals has a business interest in passing
The relationship to law
The EU AI Act refers to GPAI Code of Practice signatories and treats RSP-style commitments as presumption-of-compliance for some systemic-risk obligations. The UK AISI MOUs with labs include pre-release evaluation rights that rely on the labs' own classifications. US Executive Order 14110 (2023) required dual-use foundation model reporting, modified by Trump administration EOs in early 2025 but with some evaluation requirements preserved. Soft commitments and hard law are converging, slowly.
“A commitment is not a guarantee. It is a promise plus a mechanism for catching yourself when you are about to break it.”
Key terms in this lesson
The big idea: RSPs are the frontier labs' admission that some capabilities should change how a model is handled. They are imperfect, self-enforced, and strictly better than nothing. Knowing their architecture lets you read any new safety announcement against the actual document it cites.
End-of-lesson quiz
Check what stuck
15 questions · Score saves to your progress.
Tutor
Curious about “Responsible Scaling Policies Explained”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Creators · 45 min
The EU AI Act: The Global Floor, Whether You Like It or Not
The EU AI Act is the most sweeping AI law in the world. It will set the compliance floor for anyone who ships globally. Here is the architecture, the timeline, and what it gets right and wrong.
Creators · 45 min
Labor and AI: What the Data Actually Says
Most predictions about AI and jobs are either panic or dismissal. Here is what the best evidence through 2025 actually shows — including what is overstated.
Creators · 45 min
Constitutional AI: A Deep Dive on Anthropic's Approach
What a constitution actually contains, how the training loop works, where the research is now, and the honest trade-offs.
