Lesson 266 of 2116
Grokking: Learning That Snaps Into Place
Sometimes a network memorizes, then — long after you would have stopped training — suddenly generalizes. That is grokking, a real and weird phenomenon. Why it matters beyond the toy Grokking suggests that 'more training' can sometimes qualitatively change a model's behavior — not just improve a score but switch to a different algorithm internally.
Lesson map
What this lesson covers
Learning path
The main moves in order
- 1A Weird Training Curve
- 2grokking
- 3generalization
- 4double descent
Concept cluster
Terms to connect while reading
Section 1
A Weird Training Curve
In 2022, Power, Burda, Edwards, and colleagues at OpenAI reported something strange: a small transformer trained on modular arithmetic would reach ~100% training accuracy while test accuracy stayed near zero for thousands of epochs. Then, suddenly, test accuracy would snap to ~100%. They called the phenomenon grokking, after the Heinlein word for 'to understand fully.'
The classic curve
Training accuracy saturates. Test accuracy stays low — then snaps up far later.
Accuracy
1.0 | Train ______________________
| /
| / Test
0.5 | / ________
| / /
0.0 |____/__________/______________ Time
memorize generalize
(early) (much later)What is actually happening
- Early: the model memorizes training examples, complex internal representations
- Later: weight decay slowly pressures the model toward simpler solutions
- At the transition: an algorithmic circuit forms that actually generalizes
- After: performance generalizes, internal representations simplify
Why it matters beyond the toy
Grokking suggests that 'more training' can sometimes qualitatively change a model's behavior — not just improve a score but switch to a different algorithm internally. That has implications for how we evaluate safety during training.
- 1Scores can plateau, then jump, without any architectural change
- 2Early-stopped models can be qualitatively different from fully-trained ones
- 3Training dynamics (not just final weights) matter for capability forecasting
- 4Mechanistic interpretability is the tool that catches these transitions
“We show that neural networks can 'grok' algorithmic tasks, generalizing well after overfitting the training set.”
Key terms in this lesson
The big idea: learning is not monotonic. Grokking proves that long after training looks 'done,' the internal algorithm can still be changing.
End-of-lesson quiz
Check what stuck
15 questions · Score saves to your progress.
Tutor
Curious about “Grokking: Learning That Snaps Into Place”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Creators · 45 min
What Is Intelligence, Really? A Working Framework
Before we can judge whether an AI is intelligent, we need a framework for what intelligence even means. Draw on Chollet, Dennett, and modern evals.
Creators · 50 min
The Full Machine Learning Pipeline
From raw bytes to deployed model, every ML system follows the same ten-stage pipeline. Master it and you can read any architecture paper.
Creators · 55 min
Transformers Under the Hood
Attention, positional encoding, residual streams. A walk through the architecture that powers every frontier language model today.
