neural-forge.io

Sign inStartStart learning

Tendril

AI Foundations0%

Lesson 266 of 2116

Grokking: Learning That Snaps Into Place

Sometimes a network memorizes, then — long after you would have stopped training — suddenly generalizes. That is grokking, a real and weird phenomenon. Why it matters beyond the toy Grokking suggests that 'more training' can sometimes qualitatively change a model's behavior — not just improve a score but switch to a different algorithm internally.

CreatorsAI Foundations~24 min readAdvancedBI2 · Representation & ReasoningBI3 · LearningPrint / PDF

Lesson map

What this lesson covers

40 min16 blocks4 concepts

Learning path

The main moves in order

1A Weird Training Curve
2grokking
3generalization
4double descent

Concept cluster

Terms to connect while reading

grokkinggeneralizationdouble descentphase transition

Read3

Sections4

Lists2

Notes4

Code1

Quotes1

Section 1

A Weird Training Curve

In 2022, Power, Burda, Edwards, and colleagues at OpenAI reported something strange: a small transformer trained on modular arithmetic would reach ~100% training accuracy while test accuracy stayed near zero for thousands of epochs. Then, suddenly, test accuracy would snap to ~100%. They called the phenomenon grokking, after the Heinlein word for 'to understand fully.'

The classic curve

Training accuracy saturates. Test accuracy stays low — then snaps up far later.

text

Accuracy
 1.0 |   Train ______________________
     |        /
     |       /         Test
 0.5 |      /          ________
     |     /          /
 0.0 |____/__________/______________ Time
     memorize      generalize
     (early)       (much later)

What is actually happening

Early: the model memorizes training examples, complex internal representations
Later: weight decay slowly pressures the model toward simpler solutions
At the transition: an algorithmic circuit forms that actually generalizes
After: performance generalizes, internal representations simplify

Check-in 1. Got it so far?

Why it matters beyond the toy

Grokking suggests that 'more training' can sometimes qualitatively change a model's behavior — not just improve a score but switch to a different algorithm internally. That has implications for how we evaluate safety during training.

1Scores can plateau, then jump, without any architectural change
2Early-stopped models can be qualitatively different from fully-trained ones
3Training dynamics (not just final weights) matter for capability forecasting
4Mechanistic interpretability is the tool that catches these transitions

Check-in 2. Got it so far?

“We show that neural networks can 'grok' algorithmic tasks, generalizing well after overfitting the training set.”
Power et al., Grokking paper (2022)

Key terms in this lesson

Check-in 3. Got it so far?

The big idea: learning is not monotonic. Grokking proves that long after training looks 'done,' the internal algorithm can still be changing.

End-of-lesson quiz

Check what stuck

15 questions · Score saves to your progress.

Tutor

Curious about “Grokking: Learning That Snaps Into Place”?

Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.

Progress saved locally in this browser. Sign in to sync across devices.

Related lessons

Keep going