Lesson 26 of 2116
AI Safety Orgs and How They Actually Operate
The AI safety ecosystem is small, influential, and often misunderstood. Here is who does what, how they get funded, and how to tell real work from rhetoric.
Lesson map
What this lesson covers
Learning path
The main moves in order
- 1Who Does the Work
- 2AISI
- 3METR
- 4Apollo Research
Concept cluster
Terms to connect while reading
Section 1
Who Does the Work
When people say AI safety research, they can mean dozens of different groups doing very different things. Some do empirical evals. Some do interpretability. Some do policy. Some do field-building. Pretending they all agree or do the same thing is a common mistake.
Government-run evaluation bodies
- UK AI Security Institute (AISI, formerly UK AISI): founded Nov 2023, ~100+ researchers by 2025. Pre-release evals of frontier models. Published frontier trends reports. Funds alignment research grants.
- US AI Safety Institute (now CAISI at NIST, post-2025 rename): houses the AI Risk Management Framework, runs pre-deployment tests with Anthropic, OpenAI
- EU AI Office: within the European Commission, implementing GPAI obligations under the AI Act
- Singapore's AI Verify Foundation, Japan's AISI, Korea's AISI: national evaluator model spreading
Independent nonprofits and research orgs
- METR (Model Evaluation & Threat Research): autonomy and capability evaluations, time-horizon benchmarks, founded 2022
- Apollo Research: deceptive alignment and scheming evals, published high-profile o1 findings in 2024
- Redwood Research: AI control, adversarial training, mechanistic interpretability
- Alignment Research Center (ARC) / ARC Evals (now METR): autonomous replication evals, originally under ARC
- Center for AI Safety (CAIS): field-building, published the 2023 extinction-risk statement signed by most frontier lab CEOs
- AI Objectives Institute, FAR AI, Safe AI Forum: smaller specialized groups
Lab-internal safety teams
- Anthropic: Alignment team, Interpretability team, Frontier Red Team, Policy — central to the product
- OpenAI: Safety Systems, Preparedness (reorganized after 2024 staff departures), Policy
- Google DeepMind: AGI Safety & Alignment, Responsibility & Safety Council
- Meta AI: FAIR safety group, narrower scope
- xAI, Mistral, Cohere: smaller, more recent safety teams
Compare: what each type can and cannot do
Compare the options
| Actor | Access to model internals | Independent of lab | Can stop a release |
|---|---|---|---|
| Internal safety team | Yes | No | Sometimes (company governance) |
| METR/Apollo | API or sandboxed | Yes | No (but publish findings) |
| UK/US AISI | Pre-release, under NDA | Yes (government) | No formal veto yet |
| EU AI Office | Documentation, testing rights | Yes | Yes for systemic-risk GPAI |
| Academic researchers | Mostly public API | Yes | No |
How they actually get paid
- Government: UK AISI had £100M+ initial commitment; US AISI funded via NIST
- Open Philanthropy: largest AI safety funder, backed METR, Redwood, ARC, CAIS, others — ~$100M+/year on AI safety
- Survival and Flourishing Fund, Long-Term Future Fund, FTX Future Fund (before collapse), Founders Pledge
- Labs paying external evaluators directly (contested: critics say it compromises independence)
- Academic grants: NSF, EU Horizon, UKRI — smaller but growing
The critiques you should know
- Concentration of funding: Open Philanthropy's dominance creates homogeneity of research directions
- Safety-washing: some corporate 'safety' work is closer to PR than rigorous evaluation
- Culture conflicts: tension between rationalist/EA-heritage groups and more traditional ML research communities
- Capture risk: evaluators paid by labs they evaluate have an obvious conflict
- The accelerationist critique: some argue safety focus distracts from present-day harms like bias
Where this is heading
The 2024 Seoul AI Safety Summit produced the Frontier AI Safety Commitments, where 16 major labs pledged specific pre-deployment evals and capability thresholds. The 2025 Paris Summit rebranded as the AI Action Summit and broadened focus to economic AI. Governance is bifurcating: pre-deployment safety evals (AISI-style) on one track, general AI policy (AI Act, Executive Orders) on another.
“Safety is not a department. It is a property of the whole system, and it emerges from the culture as much as the team.”
Key terms in this lesson
The big idea: AI safety is an actual ecosystem with real people doing real work. Knowing the map — who does what, who funds what, who can stop what — lets you read any AI safety headline with the context it deserves.
End-of-lesson quiz
Check what stuck
15 questions · Score saves to your progress.
Tutor
Curious about “AI Safety Orgs and How They Actually Operate”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Creators · 45 min
Labor and AI: What the Data Actually Says
Most predictions about AI and jobs are either panic or dismissal. Here is what the best evidence through 2025 actually shows — including what is overstated.
Creators · 50 min
AI Alignment: The Actual Technical Problem
Alignment is not a vibes debate. It is a concrete technical problem about getting systems to pursue goals we actually want. Here is what researchers work on when they say they work on alignment.
Creators · 55 min
Alignment: The Full Technical Picture
What alignment actually is as a research program, how it is done in practice, what the open problems are, and where the actual papers live. A model that is always helpful will help you do harmful things.
