ResNets and the Depth Breakthrough

A 2015 paper from Microsoft Research let neural networks go 150 layers deep by adding a shortcut.

28 min · Reviewed 2026

The Depth Problem

After AlexNet in 2012, everyone wanted deeper networks. VGG went 16 to 19 layers in 2014. GoogLeNet went 22. But researchers kept hitting a wall: making networks deeper made them worse, not better, even on training data.

In December 2015, Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun at Microsoft Research Asia published Deep Residual Learning for Image Recognition. They trained networks 152 layers deep and won ImageNet by a comfortable margin.

Why this fixed depth

Gradients could flow directly through skip connections, dodging the vanishing gradient problem
Identity mappings became a safe default; layers could learn to do nothing if that was best
Optimization got much easier, so deeper became genuinely better
Performance scaled almost monotonically with depth, up to 1000 layers in follow-up work

ResNets were a contagious idea. Skip connections or variants of them now appear in almost every successful architecture: U-Nets for segmentation, DenseNets, and crucially Transformers, which use residual connections around each attention block.

Is learning better networks as easy as stacking more layers? An obstacle to answering this question was the vanishing or exploding gradient problem.
— He et al., 2015

The big idea: a tiny architectural tweak unlocked an order of magnitude of depth. The lesson, repeated throughout AI, is that the right small idea compounds enormously.

End-of-lesson check

8 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-history-resnets-creators

What is the main idea of "ResNets and the Depth Breakthrough"?
1. A 2015 paper from Microsoft Research let neural networks go 150 layers deep by adding a shortcut.
2. Use AI as the final authority for the whole decision
3. Avoid checking the answer once it sounds polished
4. Focus only on speed instead of judgment
Which concept is most central to "ResNets and the Depth Breakthrough"?
1. skip connection
2. ResNet
3. He
4. vanishing gradient
Which use of AI fits this topic best?
1. Let the AI decide what matters without your review
2. Use the answer before checking whether it fits the situation
3. Gradients could flow directly through skip connections, dodging the vanishing gradient problem
4. Treat the AI output as automatically correct
What should a careful learner remember about "The one-line idea"?
1. Use AI to draft or organize ideas about ResNet, then verify before acting.
2. Skip the context so the tool can guess faster
3. Treat the output as private even after sharing it online
4. Use the answer without checking the source
You want to use AI after this lesson. What is the safest next step?
1. Act immediately because the AI answer is written clearly
2. Use AI for drafting and comparison, but verify before publishing or relying on it.
3. Hide uncertainty so the final answer looks cleaner
4. Use private or sensitive details before checking permission
How should AI output about ResNet be treated?
1. As proof that no other source is needed
2. As a replacement for context, consent, or expert review
3. As a draft or helper output that still needs human judgment and verification
4. As something that becomes correct when it sounds confident
Name one way to verify an AI answer about ResNet.
Which action would help you apply "ResNets and the Depth Breakthrough" responsibly?
1. Use the tool to avoid thinking through the tradeoff
2. Keep going even if the output conflicts with a trusted source
3. Treat the AI output as automatically correct
4. Identity mappings became a safe default; layers could learn to do nothing if that was best

← Back to interactive lesson

Tendril · Creators · AI Foundations

ResNets and the Depth Breakthrough

A 2015 paper from Microsoft Research let neural networks go 150 layers deep by adding a shortcut.

28 min · Reviewed 2026

The Depth Problem

Why this fixed depth

Gradients could flow directly through skip connections, dodging the vanishing gradient problem
Identity mappings became a safe default; layers could learn to do nothing if that was best
Optimization got much easier, so deeper became genuinely better
Performance scaled almost monotonically with depth, up to 1000 layers in follow-up work

Is learning better networks as easy as stacking more layers? An obstacle to answering this question was the vanishing or exploding gradient problem.
— He et al., 2015

The big idea: a tiny architectural tweak unlocked an order of magnitude of depth. The lesson, repeated throughout AI, is that the right small idea compounds enormously.

End-of-lesson check

8 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-history-resnets-creators

What is the main idea of "ResNets and the Depth Breakthrough"?
1. A 2015 paper from Microsoft Research let neural networks go 150 layers deep by adding a shortcut.
2. Use AI as the final authority for the whole decision
3. Avoid checking the answer once it sounds polished
4. Focus only on speed instead of judgment
Which concept is most central to "ResNets and the Depth Breakthrough"?
1. skip connection
2. ResNet
3. He
4. vanishing gradient
Which use of AI fits this topic best?
1. Let the AI decide what matters without your review
2. Use the answer before checking whether it fits the situation
3. Gradients could flow directly through skip connections, dodging the vanishing gradient problem
4. Treat the AI output as automatically correct
What should a careful learner remember about "The one-line idea"?
1. Use AI to draft or organize ideas about ResNet, then verify before acting.
2. Skip the context so the tool can guess faster
3. Treat the output as private even after sharing it online
4. Use the answer without checking the source
You want to use AI after this lesson. What is the safest next step?
1. Act immediately because the AI answer is written clearly
2. Use AI for drafting and comparison, but verify before publishing or relying on it.
3. Hide uncertainty so the final answer looks cleaner
4. Use private or sensitive details before checking permission
How should AI output about ResNet be treated?
1. As proof that no other source is needed
2. As a replacement for context, consent, or expert review
3. As a draft or helper output that still needs human judgment and verification
4. As something that becomes correct when it sounds confident
Name one way to verify an AI answer about ResNet.
Which action would help you apply "ResNets and the Depth Breakthrough" responsibly?
1. Use the tool to avoid thinking through the tradeoff
2. Keep going even if the output conflicts with a trusted source
3. Treat the AI output as automatically correct
4. Identity mappings became a safe default; layers could learn to do nothing if that was best

← Back to interactive lesson