Loading lesson…
The past decade of AI progress came from a simple, ruthless law: more compute and more data, predictable improvements. Here is the math behind it.
In 2020, researchers at OpenAI published a paper showing that language model performance follows a predictable curve. Add more parameters, more data, and more compute, and loss goes down in a smooth, mathematical way. The industry has been chasing that curve ever since.
The 2022 Chinchilla paper from DeepMind showed earlier models were undertrained. The optimal ratio is roughly 20 tokens of training data per parameter. Many older models had too many parameters and too little data, wasting compute.
| Model | Parameters | Training tokens |
|---|---|---|
| GPT-3 (2020) | 175B | 300B |
| Chinchilla (2022) | 70B | 1.4T |
| Llama 3 (2024) | 70B | 15T |
| Modern frontier (2025-2026) | variable | tens of trillions |
Each scaling bump costs exponentially more money. Going from GPT-3 to GPT-4 reportedly cost over a hundred million dollars. The returns are real but costly. That is why inference efficiency, mixture of experts, and better data now matter as much as pure scale.
The bitter lesson is that general methods that leverage computation are ultimately the most effective.
— Rich Sutton
The big idea: AI's recent leap was largely the result of executing the scaling recipe at industrial scale. Knowing the recipe demystifies both the progress and its current bottlenecks.
8 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-builders-scaling-laws
What is the main idea of "Scaling Laws: Why Bigger Worked"?
Which concept is most central to "Scaling Laws: Why Bigger Worked"?
Which use of AI fits this topic best?
What should a careful learner remember about "Power laws, not linear"?
You want to use AI after this lesson. What is the safest next step?
How should AI output about scaling laws be treated?
Name one way to verify an AI answer about scaling laws.
Which action would help you apply "Scaling Laws: Why Bigger Worked" responsibly?