Tendril

Lesson 25 of 1570

The Environmental Cost of Training a Big Model

Training a frontier model uses the electricity of a small city for months. Running inference at scale matches a large country's load. Here is what the numbers actually look like.

BuildersEthics & Society~15 min readIntermediateProfessionalBI5 · Societal ImpactBI3 · LearningPrint / PDF

Lesson map

What this lesson covers

25 min21 blocks4 concepts

Learning path

The main moves in order

1Compute Has a Physical Footprint
2compute cost
3electricity
4water cooling

Concept cluster

Terms to connect while reading

compute costelectricitywater coolingcarbon

Sections7

Lists3

Notes4

Compare1

Quotes1

Section 1

Compute Has a Physical Footprint

Every token your favorite AI generates came out of a GPU that was drawing electricity from a grid that was, somewhere upstream, burning something or spinning something. The abstraction is clean. The physics is not.

Training numbers you should actually know

GPT-3 (2020): estimated ~1,287 MWh to train, roughly 500 metric tons of CO2
GPT-4 class (2023): publicly estimated at tens of thousands of MWh
Frontier 2025 training runs: hundreds of GWh, matching small-city annual consumption
Llama 3 70B: Meta disclosed ~1,900 MWh
Training electricity is a one-time cost — inference is forever

Inference is the real long-term story

Training a model takes weeks or months and then stops. Running it serves billions of users every day for years. By 2025, most analyses estimated that inference consumed more energy than training across the industry. The IEA projected that global data center electricity use could reach 945 TWh by 2030, with AI as the fastest-growing slice.

Check-in 1. Got it so far?

Water is the other quiet variable

Data centers are cooled by enormous amounts of water. A 2023 study estimated GPT-3 training consumed about 700,000 liters of fresh water. Microsoft's water use jumped 34 percent from 2021 to 2022, partly attributed to AI. In drought-prone regions like Arizona, this has become a political fight.

Compare: common computing footprints

Compare the options

Activity	Rough energy cost
Google search	~0.3 Wh
LLM chat turn	~3-10 Wh
Image generation	~30-100 Wh
Video generation (per second)	~300-1000 Wh
Train GPT-3 once	~1.3 GWh
Train frontier 2025 model	100+ GWh

Check-in 2. Got it so far?

What is changing

Model efficiency: smaller specialist models often beat giant ones per task
Sparse mixture of experts: only activate parts of a model per query
Custom silicon: TPUs, Trainium, Cerebras, Groq all target more ops per watt
Renewable siting: data centers are chasing hydro, geothermal, and nuclear
Nuclear renaissance: Amazon, Google, Microsoft all signed nuclear power deals in 2024-2025

What is not changing

Demand is growing faster than efficiency
New AI features create new inference load, not less
Training runs are getting bigger, not smaller
Grid buildout takes a decade; model training takes months

Check-in 3. Got it so far?

“You cannot compute your way out of thermodynamics. Somebody always pays the electric bill.”
A data center engineer

Key terms in this lesson

The big idea: AI has real physical costs that are often hidden behind a clean chat interface. Whether they are worth it is a judgment call — but you cannot make the call without knowing the numbers.

Check-in 4. Got it so far?

End-of-lesson quiz

Check what stuck

15 questions · Score saves to your progress.

Tutor

Curious about “The Environmental Cost of Training a Big Model”?

Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.

Progress saved locally in this browser. Sign in to sync across devices.

Related lessons

The Environmental Cost of Training a Big Model

Compute Has a Physical Footprint

Training numbers you should actually know

Inference is the real long-term story

Water is the other quiet variable

Compare: common computing footprints

What is changing

What is not changing

Curious about “The Environmental Cost of Training a Big Model”?

Keep going

The Environmental Cost of Training a Big Model

Compute Has a Physical Footprint

Training numbers you should actually know

Inference is the real long-term story

Water is the other quiet variable

Compare: common computing footprints

What is changing

What is not changing

Curious about “The Environmental Cost of Training a Big Model”?

Keep going