Lesson 25 of 1570
The Environmental Cost of Training a Big Model
Training a frontier model uses the electricity of a small city for months. Running inference at scale matches a large country's load. Here is what the numbers actually look like.
Lesson map
What this lesson covers
Learning path
The main moves in order
- 1Compute Has a Physical Footprint
- 2compute cost
- 3electricity
- 4water cooling
Concept cluster
Terms to connect while reading
Section 1
Compute Has a Physical Footprint
Every token your favorite AI generates came out of a GPU that was drawing electricity from a grid that was, somewhere upstream, burning something or spinning something. The abstraction is clean. The physics is not.
Training numbers you should actually know
- GPT-3 (2020): estimated ~1,287 MWh to train, roughly 500 metric tons of CO2
- GPT-4 class (2023): publicly estimated at tens of thousands of MWh
- Frontier 2025 training runs: hundreds of GWh, matching small-city annual consumption
- Llama 3 70B: Meta disclosed ~1,900 MWh
- Training electricity is a one-time cost — inference is forever
Inference is the real long-term story
Training a model takes weeks or months and then stops. Running it serves billions of users every day for years. By 2025, most analyses estimated that inference consumed more energy than training across the industry. The IEA projected that global data center electricity use could reach 945 TWh by 2030, with AI as the fastest-growing slice.
Water is the other quiet variable
Data centers are cooled by enormous amounts of water. A 2023 study estimated GPT-3 training consumed about 700,000 liters of fresh water. Microsoft's water use jumped 34 percent from 2021 to 2022, partly attributed to AI. In drought-prone regions like Arizona, this has become a political fight.
Compare: common computing footprints
Compare the options
| Activity | Rough energy cost |
|---|---|
| Google search | ~0.3 Wh |
| LLM chat turn | ~3-10 Wh |
| Image generation | ~30-100 Wh |
| Video generation (per second) | ~300-1000 Wh |
| Train GPT-3 once | ~1.3 GWh |
| Train frontier 2025 model | 100+ GWh |
What is changing
- Model efficiency: smaller specialist models often beat giant ones per task
- Sparse mixture of experts: only activate parts of a model per query
- Custom silicon: TPUs, Trainium, Cerebras, Groq all target more ops per watt
- Renewable siting: data centers are chasing hydro, geothermal, and nuclear
- Nuclear renaissance: Amazon, Google, Microsoft all signed nuclear power deals in 2024-2025
What is not changing
- Demand is growing faster than efficiency
- New AI features create new inference load, not less
- Training runs are getting bigger, not smaller
- Grid buildout takes a decade; model training takes months
“You cannot compute your way out of thermodynamics. Somebody always pays the electric bill.”
Key terms in this lesson
The big idea: AI has real physical costs that are often hidden behind a clean chat interface. Whether they are worth it is a judgment call — but you cannot make the call without knowing the numbers.
End-of-lesson quiz
Check what stuck
15 questions · Score saves to your progress.
Tutor
Curious about “The Environmental Cost of Training a Big Model”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Builders · 30 min
Prompt Injection: The Agent Era's SQL Injection
When AI can read documents and act on them, hidden instructions become attacks. Here is what prompt injection is and why nobody has fully solved it.
Builders · 25 min
Japan's Soft-Law AI Framework
Japan chose light-touch, guideline-based AI governance built on existing laws. Understanding why illuminates a real alternative to comprehensive AI acts.
Builders · 25 min
Red-Teaming: People Paid to Break AI
Red-teamers try to make models misbehave before bad actors do. Here is how the job works, who does it, and what they look for.
