Responsible Scaling Policies Explained

RSPs are the frontier labs' self-imposed rules for what capability thresholds trigger which safeguards. Here is what they commit to, what they hedge on, and what the enforcement problem is.

45 min · Reviewed 2026

A Self-Imposed Brake

A Responsible Scaling Policy (RSP) is a frontier lab's public commitment to pause or add safeguards when models cross defined capability thresholds. Anthropic introduced the concept in September 2023. OpenAI published its analog (Preparedness Framework) in December 2023. Google DeepMind followed with its Frontier Safety Framework in May 2024. Meta, xAI, and Cohere all have some version now.

The Anthropic RSP architecture (v2.2, 2025)

Anthropic's RSP defines AI Safety Levels (ASL) modeled on biosafety levels (BSL) for pathogens. Each level specifies capabilities and required safeguards.

ASL-1: No meaningful catastrophic risk (2018-era models, narrow chess AIs)
ASL-2: Early signs of dangerous capability but unreliable (most current LLMs)
ASL-3: Substantial uplift on CBRN misuse or meaningful autonomous AI R&D. Triggered May 2025 for Anthropic's frontier models. Requires stronger weight security and deployment restrictions.
ASL-4: Not yet defined concretely; associated with qualitative escalations in autonomy and misuse potential
ASL-5: Placeholder for far-future concerns

What capabilities actually trigger the next level?

CBRN uplift: can the model meaningfully help a novice with a bioweapon?
Cyber uplift: does it enable attacks that were previously infeasible?
Autonomous replication: can it copy itself across machines without human help?
AI R&D acceleration: can it fully automate junior-researcher AI work?
Persuasion: does it meaningfully exceed human baselines in influence operations?

OpenAI's Preparedness Framework

OpenAI rates models across four risk categories (cybersecurity, CBRN, persuasion, model autonomy) on a low/medium/high/critical scale. Models scoring high can be deployed only with mitigations; models scoring critical cannot be trained further until mitigations work. Pre-mitigation score matters — mitigating a critical model back to medium does not let you train the next one without reviewing.

Compare: the three big-lab frameworks

Dimension	Anthropic RSP	OpenAI Preparedness	DeepMind FSF
Structure	Levels (ASL-1..5)	Categories × severity	Critical Capability Levels
Triggered by	Capability thresholds	Pre-mitigation scores	Evaluation outcomes
Can pause training?	Yes	Yes (critical)	Yes
External input	Board, policy team	Safety Advisory Group	Internal review
Weight security level	Escalates with ASL	Escalates with severity	Escalates

What RSPs get right

Specificity: concrete capability thresholds, not vague commitments
Pre-commitment: easier to honor restrictions announced before a race
Safeguard ladders: security and deployment measures scale with capability
Publicness: the document can be evaluated and critiqued by outsiders
Interoperability: Seoul Summit Commitments use RSP-style language

What critics hammer

Self-imposed: no external enforcement, can be amended by the company at will
Revisable mid-race: Anthropic moved some thresholds between RSP versions
No peer consistency: what triggers ASL-3 at Anthropic is not what triggers high at OpenAI
Eval quality: thresholds are only as good as the evaluations probing them — and evals are still immature
Incentive problem: the company running the evals has a business interest in passing

The relationship to law

The EU AI Act refers to GPAI Code of Practice signatories and treats RSP-style commitments as presumption-of-compliance for some systemic-risk obligations. The UK AISI MOUs with labs include pre-release evaluation rights that rely on the labs' own classifications. US Executive Order 14110 (2023) required dual-use foundation model reporting, modified by Trump administration EOs in early 2025 but with some evaluation requirements preserved. Soft commitments and hard law are converging, slowly.

A commitment is not a guarantee. It is a promise plus a mechanism for catching yourself when you are about to break it.
— An RSP co-author

The big idea: RSPs are the frontier labs' admission that some capabilities should change how a model is handled. They are imperfect, self-enforced, and strictly better than nothing. Knowing their architecture lets you read any new safety announcement against the actual document it cites.

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-ethics-rsp-creators

What is the core idea behind "Responsible Scaling Policies Explained"?
1. RSPs are the frontier labs' self-imposed rules for what capability thresholds trigger which safeguards. Here is what they commit to, what they hedge on, and what the enforcement problem is.
2. Generate acknowledgment templates that respect the reporter
3. Restructure raw notes on responsible disclosure letter narrative into a coherent…
4. audience vulnerability
Which term best describes a foundational idea in "Responsible Scaling Policies Explained"?
1. ASL
2. Responsible Scaling Policy
3. Preparedness Framework
4. capability threshold
A learner studying Responsible Scaling Policies Explained would need to understand which concept?
1. Responsible Scaling Policy
2. Preparedness Framework
3. ASL
4. capability threshold
Which of these is directly relevant to Responsible Scaling Policies Explained?
1. Responsible Scaling Policy
2. ASL
3. capability threshold
4. Preparedness Framework
Which of the following is a key point about Responsible Scaling Policies Explained?
1. ASL-1: No meaningful catastrophic risk (2018-era models, narrow chess AIs)
2. ASL-2: Early signs of dangerous capability but unreliable (most current LLMs)
3. ASL-3: Substantial uplift on CBRN misuse or meaningful autonomous AI R&D.
4. ASL-4: Not yet defined concretely; associated with qualitative escalations in autonomy and misuse po…
Which of these does NOT belong in a discussion of Responsible Scaling Policies Explained?
1. ASL-3: Substantial uplift on CBRN misuse or meaningful autonomous AI R&D.
2. ASL-1: No meaningful catastrophic risk (2018-era models, narrow chess AIs)
3. Generate acknowledgment templates that respect the reporter
4. ASL-2: Early signs of dangerous capability but unreliable (most current LLMs)
Which statement is accurate regarding Responsible Scaling Policies Explained?
1. Cyber uplift: does it enable attacks that were previously infeasible?
2. Autonomous replication: can it copy itself across machines without human help?
3. CBRN uplift: can the model meaningfully help a novice with a bioweapon?
4. AI R&D acceleration: can it fully automate junior-researcher AI work?
Which of these does NOT belong in a discussion of Responsible Scaling Policies Explained?
1. CBRN uplift: can the model meaningfully help a novice with a bioweapon?
2. Cyber uplift: does it enable attacks that were previously infeasible?
3. Generate acknowledgment templates that respect the reporter
4. Autonomous replication: can it copy itself across machines without human help?
What is the key insight about "The real test is yet to come" in the context of Responsible Scaling Policies Explained?
1. Every company with an RSP has, so far, found ways to cross implied capability lines while either pausing briefly, updati…
2. Generate acknowledgment templates that respect the reporter
3. Restructure raw notes on responsible disclosure letter narrative into a coherent…
4. audience vulnerability
What is the recommended tip about "Key insight" in the context of Responsible Scaling Policies Explained?
1. Generate acknowledgment templates that respect the reporter
2. RSPs are the frontier labs' self-imposed rules for what capability thresholds trigger which safeguards.
3. Restructure raw notes on responsible disclosure letter narrative into a coherent…
4. audience vulnerability
Which statement accurately describes an aspect of Responsible Scaling Policies Explained?
1. Generate acknowledgment templates that respect the reporter
2. Restructure raw notes on responsible disclosure letter narrative into a coherent…
3. A Responsible Scaling Policy (RSP) is a frontier lab's public commitment to pause or add safeguards when models cross defined capability thr…
4. audience vulnerability
What does working with Responsible Scaling Policies Explained typically involve?
1. Generate acknowledgment templates that respect the reporter
2. Restructure raw notes on responsible disclosure letter narrative into a coherent…
3. audience vulnerability
4. Anthropic's RSP defines AI Safety Levels (ASL) modeled on biosafety levels (BSL) for pathogens.
Which of the following is true about Responsible Scaling Policies Explained?
1. OpenAI rates models across four risk categories (cybersecurity, CBRN, persuasion, model autonomy) on a low/medium/high/critical scale.
2. Generate acknowledgment templates that respect the reporter
3. Restructure raw notes on responsible disclosure letter narrative into a coherent…
4. audience vulnerability
Which best describes the scope of "Responsible Scaling Policies Explained"?
1. It is unrelated to ethics workflows
2. It focuses on RSPs are the frontier labs' self-imposed rules for what capability thresholds trigger which safeguar
3. It applies only to the opposite beginner tier
4. It was deprecated in 2024 and no longer relevant
Which section heading best belongs in a lesson about Responsible Scaling Policies Explained?
1. Generate acknowledgment templates that respect the reporter
2. Restructure raw notes on responsible disclosure letter narrative into a coherent…
3. The Anthropic RSP architecture (v2.2, 2025)
4. audience vulnerability

← Back to interactive lesson

Tendril · Creators · Ethics & Society