Loading lesson…
RSPs are the frontier labs' self-imposed rules for what capability thresholds trigger which safeguards. Here is what they commit to, what they hedge on, and what the enforcement problem is.
A Responsible Scaling Policy (RSP) is a frontier lab's public commitment to pause or add safeguards when models cross defined capability thresholds. Anthropic introduced the concept in September 2023. OpenAI published its analog (Preparedness Framework) in December 2023. Google DeepMind followed with its Frontier Safety Framework in May 2024. Meta, xAI, and Cohere all have some version now.
Anthropic's RSP defines AI Safety Levels (ASL) modeled on biosafety levels (BSL) for pathogens. Each level specifies capabilities and required safeguards.
OpenAI rates models across four risk categories (cybersecurity, CBRN, persuasion, model autonomy) on a low/medium/high/critical scale. Models scoring high can be deployed only with mitigations; models scoring critical cannot be trained further until mitigations work. Pre-mitigation score matters — mitigating a critical model back to medium does not let you train the next one without reviewing.
| Dimension | Anthropic RSP | OpenAI Preparedness | DeepMind FSF |
|---|---|---|---|
| Structure | Levels (ASL-1..5) | Categories × severity | Critical Capability Levels |
| Triggered by | Capability thresholds | Pre-mitigation scores | Evaluation outcomes |
| Can pause training? | Yes | Yes (critical) | Yes |
| External input | Board, policy team | Safety Advisory Group | Internal review |
| Weight security level | Escalates with ASL | Escalates with severity | Escalates |
The EU AI Act refers to GPAI Code of Practice signatories and treats RSP-style commitments as presumption-of-compliance for some systemic-risk obligations. The UK AISI MOUs with labs include pre-release evaluation rights that rely on the labs' own classifications. US Executive Order 14110 (2023) required dual-use foundation model reporting, modified by Trump administration EOs in early 2025 but with some evaluation requirements preserved. Soft commitments and hard law are converging, slowly.
A commitment is not a guarantee. It is a promise plus a mechanism for catching yourself when you are about to break it.
— An RSP co-author
The big idea: RSPs are the frontier labs' admission that some capabilities should change how a model is handled. They are imperfect, self-enforced, and strictly better than nothing. Knowing their architecture lets you read any new safety announcement against the actual document it cites.
15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-ethics-rsp-creators
What is the core idea behind "Responsible Scaling Policies Explained"?
Which term best describes a foundational idea in "Responsible Scaling Policies Explained"?
A learner studying Responsible Scaling Policies Explained would need to understand which concept?
Which of these is directly relevant to Responsible Scaling Policies Explained?
Which of the following is a key point about Responsible Scaling Policies Explained?
Which of these does NOT belong in a discussion of Responsible Scaling Policies Explained?
Which statement is accurate regarding Responsible Scaling Policies Explained?
Which of these does NOT belong in a discussion of Responsible Scaling Policies Explained?
What is the key insight about "The real test is yet to come" in the context of Responsible Scaling Policies Explained?
What is the recommended tip about "Key insight" in the context of Responsible Scaling Policies Explained?
Which statement accurately describes an aspect of Responsible Scaling Policies Explained?
What does working with Responsible Scaling Policies Explained typically involve?
Which of the following is true about Responsible Scaling Policies Explained?
Which best describes the scope of "Responsible Scaling Policies Explained"?
Which section heading best belongs in a lesson about Responsible Scaling Policies Explained?