Lesson 320 of 2116
GPT-3 and the Scaling Laws
In 2020, a 175 billion parameter model and a parallel paper on scaling laws redefined what bigger could mean.
Lesson map
What this lesson covers
Learning path
The main moves in order
- 1A Qualitative Leap From a Quantitative Change
- 2GPT-3
- 3scaling laws
- 4few-shot learning
Concept cluster
Terms to connect while reading
Section 1
A Qualitative Leap From a Quantitative Change
In May 2020, OpenAI published Language Models are Few-Shot Learners. GPT-3 was a Transformer with 175 billion parameters, a hundred times larger than GPT-2, trained on hundreds of billions of tokens drawn from Common Crawl, books, and Wikipedia.
The scaling laws paper
- Loss scales as a power law with parameters, data, and compute
- Larger models are more sample-efficient than smaller ones
- Optimal compute allocation favors bigger models over longer training, for a given budget
- DeepMind's Chinchilla paper in 2022 refined these laws and recommended more data relative to parameters
“Loss scales as a power law with model size, dataset size, and the amount of compute used for training, with some trends spanning more than seven orders of magnitude.”
The critical subtleties
- Scaling laws predict loss, not capabilities; capabilities emerge less predictably
- Data quality matters as much as quantity, perhaps more
- Scaling plateaus appear in specific tasks, even as average loss keeps dropping
- Cost of frontier training grew from millions to hundreds of millions of dollars
Key terms in this lesson
The big idea: GPT-3 plus scaling laws turned AI into a bet on scale. For a while, the bet paid off relentlessly. Whether it continues to pay off is one of the central questions of current AI research.
End-of-lesson quiz
Check what stuck
15 questions · Score saves to your progress.
Tutor
Curious about “GPT-3 and the Scaling Laws”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Creators · 35 min
Calculus with AI: Limits, Derivatives, and Not Getting Lost
Calculus is where a lot of smart students hit a wall. Wolfram|Alpha and Claude can walk you through every step, but only if you already did the setup work.
Creators · 36 min
AP Physics: Free-Body Diagrams and Walkthroughs
Physics problems are 40 percent drawing the right picture. AI models that can see your free-body diagram and critique it are close to having a TA on call.
Creators · 40 min
Agent Benchmarks: WebArena, GAIA, OSWorld
LLM benchmarks are about single answers. Agent benchmarks measure multi-step real-world task completion. Very different beast.
