neural-forge.io

Sign inStartStart learning

Tendril

Ethics & Society0%

Lesson 26 of 2116

AI Safety Orgs and How They Actually Operate

The AI safety ecosystem is small, influential, and often misunderstood. Here is who does what, how they get funded, and how to tell real work from rhetoric.

CreatorsEthics & Society~24 min readAdvancedResearcherBI5 · Societal ImpactBI1 · PerceptionPrint / PDF

Lesson map

What this lesson covers

40 min22 blocks5 concepts

Learning path

The main moves in order

1Who Does the Work
2AISI
3METR
4Apollo Research

Concept cluster

Terms to connect while reading

AISIMETRApollo ResearchARC EvalsRedwood Research

Read3

Sections8

Lists5

Notes3

Compare1

Quotes1

Section 1

Who Does the Work

When people say AI safety research, they can mean dozens of different groups doing very different things. Some do empirical evals. Some do interpretability. Some do policy. Some do field-building. Pretending they all agree or do the same thing is a common mistake.

Government-run evaluation bodies

UK AI Security Institute (AISI, formerly UK AISI): founded Nov 2023, ~100+ researchers by 2025. Pre-release evals of frontier models. Published frontier trends reports. Funds alignment research grants.
US AI Safety Institute (now CAISI at NIST, post-2025 rename): houses the AI Risk Management Framework, runs pre-deployment tests with Anthropic, OpenAI
EU AI Office: within the European Commission, implementing GPAI obligations under the AI Act
Singapore's AI Verify Foundation, Japan's AISI, Korea's AISI: national evaluator model spreading

Independent nonprofits and research orgs

METR (Model Evaluation & Threat Research): autonomy and capability evaluations, time-horizon benchmarks, founded 2022
Apollo Research: deceptive alignment and scheming evals, published high-profile o1 findings in 2024
Redwood Research: AI control, adversarial training, mechanistic interpretability
Alignment Research Center (ARC) / ARC Evals (now METR): autonomous replication evals, originally under ARC
Center for AI Safety (CAIS): field-building, published the 2023 extinction-risk statement signed by most frontier lab CEOs
AI Objectives Institute, FAR AI, Safe AI Forum: smaller specialized groups

Check-in 1. Got it so far?

Lab-internal safety teams

Anthropic: Alignment team, Interpretability team, Frontier Red Team, Policy — central to the product
OpenAI: Safety Systems, Preparedness (reorganized after 2024 staff departures), Policy
Google DeepMind: AGI Safety & Alignment, Responsibility & Safety Council
Meta AI: FAIR safety group, narrower scope
xAI, Mistral, Cohere: smaller, more recent safety teams

Compare: what each type can and cannot do

Compare the options

Actor	Access to model internals	Independent of lab	Can stop a release
Internal safety team	Yes	No	Sometimes (company governance)
METR/Apollo	API or sandboxed	Yes	No (but publish findings)
UK/US AISI	Pre-release, under NDA	Yes (government)	No formal veto yet
EU AI Office	Documentation, testing rights	Yes	Yes for systemic-risk GPAI
Academic researchers	Mostly public API	Yes	No

How they actually get paid

Government: UK AISI had £100M+ initial commitment; US AISI funded via NIST
Open Philanthropy: largest AI safety funder, backed METR, Redwood, ARC, CAIS, others — ~$100M+/year on AI safety
Survival and Flourishing Fund, Long-Term Future Fund, FTX Future Fund (before collapse), Founders Pledge
Labs paying external evaluators directly (contested: critics say it compromises independence)
Academic grants: NSF, EU Horizon, UKRI — smaller but growing

Check-in 2. Got it so far?

The critiques you should know

Concentration of funding: Open Philanthropy's dominance creates homogeneity of research directions
Safety-washing: some corporate 'safety' work is closer to PR than rigorous evaluation
Culture conflicts: tension between rationalist/EA-heritage groups and more traditional ML research communities
Capture risk: evaluators paid by labs they evaluate have an obvious conflict
The accelerationist critique: some argue safety focus distracts from present-day harms like bias

Where this is heading

The 2024 Seoul AI Safety Summit produced the Frontier AI Safety Commitments, where 16 major labs pledged specific pre-deployment evals and capability thresholds. The 2025 Paris Summit rebranded as the AI Action Summit and broadened focus to economic AI. Governance is bifurcating: pre-deployment safety evals (AISI-style) on one track, general AI policy (AI Act, Executive Orders) on another.

Check-in 3. Got it so far?

“Safety is not a department. It is a property of the whole system, and it emerges from the culture as much as the team.”
Helen Toner, former OpenAI board member

Key terms in this lesson

The big idea: AI safety is an actual ecosystem with real people doing real work. Knowing the map — who does what, who funds what, who can stop what — lets you read any AI safety headline with the context it deserves.

Check-in 4. Got it so far?

End-of-lesson quiz

Check what stuck

15 questions · Score saves to your progress.

Tutor

Curious about “AI Safety Orgs and How They Actually Operate”?

Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.

Progress saved locally in this browser. Sign in to sync across devices.

Related lessons

Keep going