Compute Thresholds: Regulating by FLOPs

Almost every AI regulation uses training compute as a trigger. 10^25 here, 10^26 there. Why compute, and why those numbers?

25 min · Reviewed 2026

Why Compute

Regulators want to catch frontier risk without freezing the whole field. They need a trigger that's measurable, hard to game, and correlated with capability. Training compute — floating-point operations used during model training — is the best proxy currently available.

The main thresholds (as of ~2025)

Regime	Threshold	Applies to
EU AI Act (systemic risk)	10^25 FLOPs	General-purpose models
Biden EO 14110 (rescinded)	10^26 FLOPs	Reporting to US government
Biden EO (bio subset)	10^23 FLOPs	Biologically-focused models
California SB 1047 (vetoed)	10^26 FLOPs + $100M	Covered models

Why this approach has critics

Algorithmic efficiency means same capability at lower compute over time — thresholds drift
Small specialized models can be dangerous without being compute-heavy
Inference compute (test-time reasoning like o1) is not captured by training FLOPs
Distillation can transfer capability from a big model to a small one
Open-source releases create compute-independent diffusion

The big idea: compute is the only thing regulators can easily count. That makes it the default regulatory hook, for better and worse.

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-safety2-compute-thresholds-builders

What is the core idea behind "Compute Thresholds: Regulating by FLOPs"?
1. Almost every AI regulation uses training compute as a trigger. 10^25 here, 10^26 there. Why compute, and why those numbers?
2. training-time alignment
3. Attackers need one path. Defenders must close all paths.
4. Coordination with the US AI Safety Institute (later CAISI) and similar bodies
Which term best describes a foundational idea in "Compute Thresholds: Regulating by FLOPs"?
1. compute threshold
2. FLOP
3. training compute
4. inference compute
A learner studying Compute Thresholds: Regulating by FLOPs would need to understand which concept?
1. FLOP
2. training compute
3. compute threshold
4. inference compute
Which of these is directly relevant to Compute Thresholds: Regulating by FLOPs?
1. FLOP
2. compute threshold
3. inference compute
4. training compute
Which of the following is a key point about Compute Thresholds: Regulating by FLOPs?
1. Algorithmic efficiency means same capability at lower compute over time — thresholds drift
2. Small specialized models can be dangerous without being compute-heavy
3. Inference compute (test-time reasoning like o1) is not captured by training FLOPs
4. Distillation can transfer capability from a big model to a small one
Which of these does NOT belong in a discussion of Compute Thresholds: Regulating by FLOPs?
1. Algorithmic efficiency means same capability at lower compute over time — thresholds drift
2. Small specialized models can be dangerous without being compute-heavy
3. Inference compute (test-time reasoning like o1) is not captured by training FLOPs
4. training-time alignment
What is the key insight about "Scale context" in the context of Compute Thresholds: Regulating by FLOPs?
1. training-time alignment
2. Attackers need one path. Defenders must close all paths.
3. GPT-4 is estimated at roughly 2×10^25 FLOPs. Llama 3.1 405B at roughly 4×10^25. Future frontier models will cross 10^26.
4. Coordination with the US AI Safety Institute (later CAISI) and similar bodies
What is the key insight about "Better proxies needed" in the context of Compute Thresholds: Regulating by FLOPs?
1. training-time alignment
2. Attackers need one path. Defenders must close all paths.
3. Coordination with the US AI Safety Institute (later CAISI) and similar bodies
4. Researchers are exploring capability-based triggers — can the model do dangerous task X? — as alternatives.
Which statement accurately describes an aspect of Compute Thresholds: Regulating by FLOPs?
1. Regulators want to catch frontier risk without freezing the whole field.
2. training-time alignment
3. Attackers need one path. Defenders must close all paths.
4. Coordination with the US AI Safety Institute (later CAISI) and similar bodies
What does working with Compute Thresholds: Regulating by FLOPs typically involve?
1. training-time alignment
2. The big idea: compute is the only thing regulators can easily count. That makes it the default regulatory hook, for better and worse.
3. Attackers need one path. Defenders must close all paths.
4. Coordination with the US AI Safety Institute (later CAISI) and similar bodies
Which best describes the scope of "Compute Thresholds: Regulating by FLOPs"?
1. It is unrelated to ethics workflows
2. It applies only to the opposite beginner tier
3. It focuses on Almost every AI regulation uses training compute as a trigger. 10^25 here, 10^26 there. Why compute,
4. It was deprecated in 2024 and no longer relevant
Which section heading best belongs in a lesson about Compute Thresholds: Regulating by FLOPs?
1. training-time alignment
2. Attackers need one path. Defenders must close all paths.
3. Coordination with the US AI Safety Institute (later CAISI) and similar bodies
4. The main thresholds (as of ~2025)
Which section heading best belongs in a lesson about Compute Thresholds: Regulating by FLOPs?
1. Why this approach has critics
2. training-time alignment
3. Attackers need one path. Defenders must close all paths.
4. Coordination with the US AI Safety Institute (later CAISI) and similar bodies
Which of the following is a concept covered in Compute Thresholds: Regulating by FLOPs?
1. compute threshold
2. FLOP
3. training compute
4. inference compute
Which of the following is a concept covered in Compute Thresholds: Regulating by FLOPs?
1. FLOP
2. training compute
3. compute threshold
4. inference compute

← Back to interactive lesson

Tendril · Builders · Ethics & Society

Compute Thresholds: Regulating by FLOPs

Almost every AI regulation uses training compute as a trigger. 10^25 here, 10^26 there. Why compute, and why those numbers?

25 min · Reviewed 2026

Why Compute

The main thresholds (as of ~2025)

Regime	Threshold	Applies to
EU AI Act (systemic risk)	10^25 FLOPs	General-purpose models
Biden EO 14110 (rescinded)	10^26 FLOPs	Reporting to US government
Biden EO (bio subset)	10^23 FLOPs	Biologically-focused models
California SB 1047 (vetoed)	10^26 FLOPs + $100M	Covered models

Why this approach has critics

Algorithmic efficiency means same capability at lower compute over time — thresholds drift
Small specialized models can be dangerous without being compute-heavy
Inference compute (test-time reasoning like o1) is not captured by training FLOPs
Distillation can transfer capability from a big model to a small one
Open-source releases create compute-independent diffusion

The big idea: compute is the only thing regulators can easily count. That makes it the default regulatory hook, for better and worse.

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-safety2-compute-thresholds-builders

What is the core idea behind "Compute Thresholds: Regulating by FLOPs"?
1. Almost every AI regulation uses training compute as a trigger. 10^25 here, 10^26 there. Why compute, and why those numbers?
2. training-time alignment
3. Attackers need one path. Defenders must close all paths.
4. Coordination with the US AI Safety Institute (later CAISI) and similar bodies
Which term best describes a foundational idea in "Compute Thresholds: Regulating by FLOPs"?
1. compute threshold
2. FLOP
3. training compute
4. inference compute
A learner studying Compute Thresholds: Regulating by FLOPs would need to understand which concept?
1. FLOP
2. training compute
3. compute threshold
4. inference compute
Which of these is directly relevant to Compute Thresholds: Regulating by FLOPs?
1. FLOP
2. compute threshold
3. inference compute
4. training compute
Which of the following is a key point about Compute Thresholds: Regulating by FLOPs?
1. Algorithmic efficiency means same capability at lower compute over time — thresholds drift
2. Small specialized models can be dangerous without being compute-heavy
3. Inference compute (test-time reasoning like o1) is not captured by training FLOPs
4. Distillation can transfer capability from a big model to a small one
Which of these does NOT belong in a discussion of Compute Thresholds: Regulating by FLOPs?
1. Algorithmic efficiency means same capability at lower compute over time — thresholds drift
2. Small specialized models can be dangerous without being compute-heavy
3. Inference compute (test-time reasoning like o1) is not captured by training FLOPs
4. training-time alignment
What is the key insight about "Scale context" in the context of Compute Thresholds: Regulating by FLOPs?
1. training-time alignment
2. Attackers need one path. Defenders must close all paths.
3. GPT-4 is estimated at roughly 2×10^25 FLOPs. Llama 3.1 405B at roughly 4×10^25. Future frontier models will cross 10^26.
4. Coordination with the US AI Safety Institute (later CAISI) and similar bodies
What is the key insight about "Better proxies needed" in the context of Compute Thresholds: Regulating by FLOPs?
1. training-time alignment
2. Attackers need one path. Defenders must close all paths.
3. Coordination with the US AI Safety Institute (later CAISI) and similar bodies
4. Researchers are exploring capability-based triggers — can the model do dangerous task X? — as alternatives.
Which statement accurately describes an aspect of Compute Thresholds: Regulating by FLOPs?
1. Regulators want to catch frontier risk without freezing the whole field.
2. training-time alignment
3. Attackers need one path. Defenders must close all paths.
4. Coordination with the US AI Safety Institute (later CAISI) and similar bodies
What does working with Compute Thresholds: Regulating by FLOPs typically involve?
1. training-time alignment
2. The big idea: compute is the only thing regulators can easily count. That makes it the default regulatory hook, for better and worse.
3. Attackers need one path. Defenders must close all paths.
4. Coordination with the US AI Safety Institute (later CAISI) and similar bodies
Which best describes the scope of "Compute Thresholds: Regulating by FLOPs"?
1. It is unrelated to ethics workflows
2. It applies only to the opposite beginner tier
3. It focuses on Almost every AI regulation uses training compute as a trigger. 10^25 here, 10^26 there. Why compute,
4. It was deprecated in 2024 and no longer relevant
Which section heading best belongs in a lesson about Compute Thresholds: Regulating by FLOPs?
1. training-time alignment
2. Attackers need one path. Defenders must close all paths.
3. Coordination with the US AI Safety Institute (later CAISI) and similar bodies
4. The main thresholds (as of ~2025)
Which section heading best belongs in a lesson about Compute Thresholds: Regulating by FLOPs?
1. Why this approach has critics
2. training-time alignment
3. Attackers need one path. Defenders must close all paths.
4. Coordination with the US AI Safety Institute (later CAISI) and similar bodies
Which of the following is a concept covered in Compute Thresholds: Regulating by FLOPs?
1. compute threshold
2. FLOP
3. training compute
4. inference compute
Which of the following is a concept covered in Compute Thresholds: Regulating by FLOPs?
1. FLOP
2. training compute
3. compute threshold
4. inference compute

← Back to interactive lesson