AI Safety Orgs and How They Actually Operate

The AI safety ecosystem is small, influential, and often misunderstood. Here is who does what, how they get funded, and how to tell real work from rhetoric.

40 min · Reviewed 2026

Who Does the Work

When people say AI safety research, they can mean dozens of different groups doing very different things. Some do empirical evals. Some do interpretability. Some do policy. Some do field-building. Pretending they all agree or do the same thing is a common mistake.

Government-run evaluation bodies

UK AI Security Institute (AISI, formerly UK AISI): founded Nov 2023, ~100+ researchers by 2025. Pre-release evals of frontier models. Published frontier trends reports. Funds alignment research grants.
US AI Safety Institute (now CAISI at NIST, post-2025 rename): houses the AI Risk Management Framework, runs pre-deployment tests with Anthropic, OpenAI
EU AI Office: within the European Commission, implementing GPAI obligations under the AI Act
Singapore's AI Verify Foundation, Japan's AISI, Korea's AISI: national evaluator model spreading

Independent nonprofits and research orgs

METR (Model Evaluation & Threat Research): autonomy and capability evaluations, time-horizon benchmarks, founded 2022
Apollo Research: deceptive alignment and scheming evals, published high-profile o1 findings in 2024
Redwood Research: AI control, adversarial training, mechanistic interpretability
Alignment Research Center (ARC) / ARC Evals (now METR): autonomous replication evals, originally under ARC
Center for AI Safety (CAIS): field-building, published the 2023 extinction-risk statement signed by most frontier lab CEOs
AI Objectives Institute, FAR AI, Safe AI Forum: smaller specialized groups

Lab-internal safety teams

Anthropic: Alignment team, Interpretability team, Frontier Red Team, Policy — central to the product
OpenAI: Safety Systems, Preparedness (reorganized after 2024 staff departures), Policy
Google DeepMind: AGI Safety & Alignment, Responsibility & Safety Council
Meta AI: FAIR safety group, narrower scope
xAI, Mistral, Cohere: smaller, more recent safety teams

Compare: what each type can and cannot do

Actor	Access to model internals	Independent of lab	Can stop a release
Internal safety team	Yes	No	Sometimes (company governance)
METR/Apollo	API or sandboxed	Yes	No (but publish findings)
UK/US AISI	Pre-release, under NDA	Yes (government)	No formal veto yet
EU AI Office	Documentation, testing rights	Yes	Yes for systemic-risk GPAI
Academic researchers	Mostly public API	Yes	No

How they actually get paid

Government: UK AISI had £100M+ initial commitment; US AISI funded via NIST
Open Philanthropy: largest AI safety funder, backed METR, Redwood, ARC, CAIS, others — ~$100M+/year on AI safety
Survival and Flourishing Fund, Long-Term Future Fund, FTX Future Fund (before collapse), Founders Pledge
Labs paying external evaluators directly (contested: critics say it compromises independence)
Academic grants: NSF, EU Horizon, UKRI — smaller but growing

The critiques you should know

Concentration of funding: Open Philanthropy's dominance creates homogeneity of research directions
Safety-washing: some corporate 'safety' work is closer to PR than rigorous evaluation
Culture conflicts: tension between rationalist/EA-heritage groups and more traditional ML research communities
Capture risk: evaluators paid by labs they evaluate have an obvious conflict
The accelerationist critique: some argue safety focus distracts from present-day harms like bias

Where this is heading

The 2024 Seoul AI Safety Summit produced the Frontier AI Safety Commitments, where 16 major labs pledged specific pre-deployment evals and capability thresholds. The 2025 Paris Summit rebranded as the AI Action Summit and broadened focus to economic AI. Governance is bifurcating: pre-deployment safety evals (AISI-style) on one track, general AI policy (AI Act, Executive Orders) on another.

Safety is not a department. It is a property of the whole system, and it emerges from the culture as much as the team.
— Helen Toner, former OpenAI board member

The big idea: AI safety is an actual ecosystem with real people doing real work. Knowing the map — who does what, who funds what, who can stop what — lets you read any AI safety headline with the context it deserves.

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-ethics-safety-orgs-creators

What is the core idea behind "AI Safety Orgs and How They Actually Operate"?
1. The AI safety ecosystem is small, influential, and often misunderstood. Here is who does what, how they get funded, and how to tell real work from rhetoric.
2. Scary pictures? Tell a grown-up
3. investigation
4. Support cooperatives where they fit your needs
Which term best describes a foundational idea in "AI Safety Orgs and How They Actually Operate"?
1. METR
2. AISI
3. Apollo Research
4. Open Philanthropy
A learner studying AI Safety Orgs and How They Actually Operate would need to understand which concept?
1. AISI
2. Apollo Research
3. METR
4. Open Philanthropy
Which of these is directly relevant to AI Safety Orgs and How They Actually Operate?
1. AISI
2. METR
3. Open Philanthropy
4. Apollo Research
Which of the following is a key point about AI Safety Orgs and How They Actually Operate?
1. UK AI Security Institute (AISI, formerly UK AISI): founded Nov 2023, ~100+ researchers by 2025.
2. US AI Safety Institute (now CAISI at NIST, post-2025 rename): houses the AI Risk Management Framewor…
3. EU AI Office: within the European Commission, implementing GPAI obligations under the AI Act
4. Singapore's AI Verify Foundation, Japan's AISI, Korea's AISI: national evaluator model spreading
Which of these does NOT belong in a discussion of AI Safety Orgs and How They Actually Operate?
1. UK AI Security Institute (AISI, formerly UK AISI): founded Nov 2023, ~100+ researchers by 2025.
2. EU AI Office: within the European Commission, implementing GPAI obligations under the AI Act
3. Scary pictures? Tell a grown-up
4. US AI Safety Institute (now CAISI at NIST, post-2025 rename): houses the AI Risk Management Framewor…
Which statement is accurate regarding AI Safety Orgs and How They Actually Operate?
1. Apollo Research: deceptive alignment and scheming evals, published high-profile o1 findings in 2024
2. Redwood Research: AI control, adversarial training, mechanistic interpretability
3. METR (Model Evaluation & Threat Research): autonomy and capability evaluations, time-horizon benchma…
4. Alignment Research Center (ARC) / ARC Evals (now METR): autonomous replication evals, originally und…
Which of these does NOT belong in a discussion of AI Safety Orgs and How They Actually Operate?
1. Redwood Research: AI control, adversarial training, mechanistic interpretability
2. Scary pictures? Tell a grown-up
3. METR (Model Evaluation & Threat Research): autonomy and capability evaluations, time-horizon benchma…
4. Apollo Research: deceptive alignment and scheming evals, published high-profile o1 findings in 2024
What is the key insight about "How to tell real work from rhetoric" in the context of AI Safety Orgs and How They Actually Operate?
1. Look for: peer review or equivalent external critique, reproducible methods, published negative results, named authors w…
2. Scary pictures? Tell a grown-up
3. investigation
4. Support cooperatives where they fit your needs
What is the recommended tip about "Key insight" in the context of AI Safety Orgs and How They Actually Operate?
1. Scary pictures? Tell a grown-up
2. The AI safety ecosystem is small, influential, and often misunderstood.
3. investigation
4. Support cooperatives where they fit your needs
Which statement accurately describes an aspect of AI Safety Orgs and How They Actually Operate?
1. Scary pictures? Tell a grown-up
2. investigation
3. When people say AI safety research, they can mean dozens of different groups doing very different things. Some do empirical evals.
4. Support cooperatives where they fit your needs
What does working with AI Safety Orgs and How They Actually Operate typically involve?
1. Scary pictures? Tell a grown-up
2. investigation
3. Support cooperatives where they fit your needs
4. The 2024 Seoul AI Safety Summit produced the Frontier AI Safety Commitments, where 16 major labs pledged specific pre-deployment evals and c…
Which of the following is true about AI Safety Orgs and How They Actually Operate?
1. The big idea: AI safety is an actual ecosystem with real people doing real work.
2. Scary pictures? Tell a grown-up
3. investigation
4. Support cooperatives where they fit your needs
Which best describes the scope of "AI Safety Orgs and How They Actually Operate"?
1. It is unrelated to ethics workflows
2. It focuses on The AI safety ecosystem is small, influential, and often misunderstood. Here is who does what, how t
3. It applies only to the opposite beginner tier
4. It was deprecated in 2024 and no longer relevant
Which section heading best belongs in a lesson about AI Safety Orgs and How They Actually Operate?
1. Scary pictures? Tell a grown-up
2. investigation
3. Government-run evaluation bodies
4. Support cooperatives where they fit your needs

← Back to interactive lesson

Tendril · Creators · Ethics & Society