Attention Is All You Need, 2017

Eight Google authors replaced recurrence with attention and quietly launched the modern AI era.

CreatorsAI Foundations~19 min readAdvancedBI2 · Representation & ReasoningBI3 · LearningBI5 · Societal ImpactPrint / PDF

Lesson map

What this lesson covers

32 min12 blocks4 concepts

Learning path

The main moves in order

1The Paper That Rewired NLP
2Transformer
3self-attention
4Vaswani

Concept cluster

Terms to connect while reading

Transformerself-attentionVaswanisequence modeling

Sections2

Lists1

Notes3

Quotes1

Terms1

Section 1

The Paper That Rewired NLP

In June 2017, eight authors from Google Brain and Google Research, led by Ashish Vaswani, posted Attention Is All You Need on arXiv. The paper proposed the Transformer, an architecture with no recurrence and no convolutions. Just attention, feed-forward layers, and residual connections.

The immediate target was machine translation, where Transformers beat the best recurrent models on English-to-German while training several times faster. Within two years, the paper would be one of the most cited in AI history.

Check-in 1. Got it so far?

Why Transformers took over

1Parallel training: unlike RNNs, attention processes all positions at once, making full use of GPUs
2Long-range dependencies: attention sees the whole sequence, not just a compressed state
3Scaling: performance improved smoothly with more parameters, more data, and more compute
4Generality: the same architecture worked for text, images, audio, code, and proteins

The Transformer became the common substrate for everything that followed. BERT from Google in 2018 used the encoder. GPT from OpenAI used the decoder. Vision Transformers, Whisper, AlphaFold, Stable Diffusion, and the current generation of chat models all rest on it.

“We propose a new simple network architecture, the Transformer, based solely on attention mechanisms.”
Vaswani et al., 2017

Check-in 2. Got it so far?

Key terms in this lesson

The big idea: one clean architectural choice, attention without recurrence, unlocked a decade of scaling. Every LLM you use today traces directly to this paper.

Check-in 3. Got it so far?

End-of-lesson quiz

Check what stuck

15 questions · Score saves to your progress.

Tutor

Curious about “Attention Is All You Need, 2017”?

Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.

Progress saved locally in this browser. Sign in to sync across devices.

Related lessons

Attention Is All You Need, 2017

The Paper That Rewired NLP

Why Transformers took over

Curious about “Attention Is All You Need, 2017”?

Keep going

Attention Is All You Need, 2017

The Paper That Rewired NLP

Why Transformers took over

Curious about “Attention Is All You Need, 2017”?

Keep going