Loading lesson…
A 2015 paper from Microsoft Research let neural networks go 150 layers deep by adding a shortcut.
After AlexNet in 2012, everyone wanted deeper networks. VGG went 16 to 19 layers in 2014. GoogLeNet went 22. But researchers kept hitting a wall: making networks deeper made them worse, not better, even on training data.
In December 2015, Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun at Microsoft Research Asia published Deep Residual Learning for Image Recognition. They trained networks 152 layers deep and won ImageNet by a comfortable margin.
ResNets were a contagious idea. Skip connections or variants of them now appear in almost every successful architecture: U-Nets for segmentation, DenseNets, and crucially Transformers, which use residual connections around each attention block.
Is learning better networks as easy as stacking more layers? An obstacle to answering this question was the vanishing or exploding gradient problem.
— He et al., 2015
The big idea: a tiny architectural tweak unlocked an order of magnitude of depth. The lesson, repeated throughout AI, is that the right small idea compounds enormously.
8 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-history-resnets-creators
What is the main idea of "ResNets and the Depth Breakthrough"?
Which concept is most central to "ResNets and the Depth Breakthrough"?
Which use of AI fits this topic best?
What should a careful learner remember about "The one-line idea"?
You want to use AI after this lesson. What is the safest next step?
How should AI output about ResNet be treated?
Name one way to verify an AI answer about ResNet.
Which action would help you apply "ResNets and the Depth Breakthrough" responsibly?