Loading lesson…
Eight Google authors replaced recurrence with attention and quietly launched the modern AI era.
In June 2017, eight authors from Google Brain and Google Research, led by Ashish Vaswani, posted Attention Is All You Need on arXiv. The paper proposed the Transformer, an architecture with no recurrence and no convolutions. Just attention, feed-forward layers, and residual connections.
The immediate target was machine translation, where Transformers beat the best recurrent models on English-to-German while training several times faster. Within two years, the paper would be one of the most cited in AI history.
The Transformer became the common substrate for everything that followed. BERT from Google in 2018 used the encoder. GPT from OpenAI used the decoder. Vision Transformers, Whisper, AlphaFold, Stable Diffusion, and the current generation of chat models all rest on it.
We propose a new simple network architecture, the Transformer, based solely on attention mechanisms.
— Vaswani et al., 2017
The big idea: one clean architectural choice, attention without recurrence, unlocked a decade of scaling. Every LLM you use today traces directly to this paper.
8 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-history-attention-2017-creators
What is the main idea of "Attention Is All You Need, 2017"?
Which concept is most central to "Attention Is All You Need, 2017"?
Which use of AI fits this topic best?
What should a careful learner remember about "What self-attention does"?
You want to use AI after this lesson. What is the safest next step?
How should AI output about Transformer be treated?
Name one way to verify an AI answer about Transformer.
Which action would help you apply "Attention Is All You Need, 2017" responsibly?