Lesson 318 of 2116
ResNets and the Depth Breakthrough
A 2015 paper from Microsoft Research let neural networks go 150 layers deep by adding a shortcut.
Lesson map
What this lesson covers
Learning path
The main moves in order
- 1The Depth Problem
- 2ResNet
- 3skip connection
- 4He
Concept cluster
Terms to connect while reading
Section 1
The Depth Problem
After AlexNet in 2012, everyone wanted deeper networks. VGG went 16 to 19 layers in 2014. GoogLeNet went 22. But researchers kept hitting a wall: making networks deeper made them worse, not better, even on training data.
In December 2015, Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun at Microsoft Research Asia published Deep Residual Learning for Image Recognition. They trained networks 152 layers deep and won ImageNet by a comfortable margin.
Why this fixed depth
- Gradients could flow directly through skip connections, dodging the vanishing gradient problem
- Identity mappings became a safe default; layers could learn to do nothing if that was best
- Optimization got much easier, so deeper became genuinely better
- Performance scaled almost monotonically with depth, up to 1000 layers in follow-up work
ResNets were a contagious idea. Skip connections or variants of them now appear in almost every successful architecture: U-Nets for segmentation, DenseNets, and crucially Transformers, which use residual connections around each attention block.
“Is learning better networks as easy as stacking more layers? An obstacle to answering this question was the vanishing or exploding gradient problem.”
Key terms in this lesson
The big idea: a tiny architectural tweak unlocked an order of magnitude of depth. The lesson, repeated throughout AI, is that the right small idea compounds enormously.
End-of-lesson quiz
Check what stuck
15 questions · Score saves to your progress.
Tutor
Curious about “ResNets and the Depth Breakthrough”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Creators · 45 min
Open vs. Closed Models: Philosophy and Strategy
Open-source AI is both a technical movement and a political one. Understand the arguments so you can pick a stack and defend it.
Creators · 30 min
Mean, Median, Mode: Three Kinds of Average
Saying the average is 50,000 dollars can mean three different things. Picking the wrong kind of average is how statistics starts lying to you.
Creators · 45 min
Your First Dataset Project, End to End
A complete walkthrough from question to shareable dataset. The first project is the hardest; this lesson gets you to the other side.
