Lesson 271 of 1596
Backpropagation Rediscovered, 1986
Rumelhart, Hinton, and Williams published the algorithm that would eventually power everything.
Creators · AI Foundations · ~18 min read
The Algorithm That Trained Deep Networks
In 1986, David Rumelhart, Geoffrey Hinton, and Ronald Williams published Learning representations by back-propagating errors in Nature. Their paper showed how to train multi-layer neural networks by using the chain rule to propagate error gradients from output back to input.
How backprop works, in one paragraph
- 1Forward pass: compute the prediction layer by layer
- 2Compute the loss against the true label
- 3Backward pass: compute gradients from output back to input
- 4Update weights by a small step against the gradient
- 5Repeat across the dataset, many times
“It was a really good feeling when we realized this was going to work.”
What backprop needed to fully arrive
- Massive labeled datasets, eventually provided by the internet
- Parallel compute, eventually provided by NVIDIA GPUs
- Better activation functions like ReLU, which dodged vanishing gradients
- Patience from a handful of researchers, especially Hinton, Bengio, and LeCun
Key terms in this lesson
The big idea: the central algorithm of modern AI was published in 1986 and then sat mostly dormant for a generation. Sometimes the bottleneck is not the math but the hardware and data around it.
End-of-lesson quiz
Check what stuck
8 questions · Score saves to your progress.
Tutor
Curious about “Backpropagation Rediscovered, 1986”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Creators · 55 min
The Three Ingredients: Data, Compute, Algorithms (Capstone)
Every AI breakthrough of the past decade rests on three interacting ingredients. Synthesize everything you have learned into one working model.
Creators · 45 min
Uncertainty Quantification in LLMs
A model that says 'I am 95 percent sure' and is wrong 40 percent of the time is miscalibrated. Measuring that gap is uncertainty quantification.
Creators · 30 min
Shannon and the Birth of Information
Claude Shannon turned communication into mathematics and gave AI the substrate it would need.
