Grokking: Learning That Snaps Into Place

Sometimes a network memorizes, then — long after you would have stopped training — suddenly generalizes. That is grokking, a real and weird phenomenon. Why it matters beyond the toy Grokking suggests that 'more training' can sometimes qualitatively change a model's behavior — not just improve a score but switch to a different algorithm internally.

40 min · Reviewed 2026

A Weird Training Curve

In 2022, Power, Burda, Edwards, and colleagues at OpenAI reported something strange: a small transformer trained on modular arithmetic would reach ~100% training accuracy while test accuracy stayed near zero for thousands of epochs. Then, suddenly, test accuracy would snap to ~100%. They called the phenomenon grokking, after the Heinlein word for 'to understand fully.'

The classic curve

Accuracy
 1.0 |   Train ______________________
     |        /
     |       /         Test
 0.5 |      /          ________
     |     /          /
 0.0 |____/__________/______________ Time
     memorize      generalize
     (early)       (much later)Training accuracy saturates. Test accuracy stays low — then snaps up far later.

What is actually happening

Early: the model memorizes training examples, complex internal representations
Later: weight decay slowly pressures the model toward simpler solutions
At the transition: an algorithmic circuit forms that actually generalizes
After: performance generalizes, internal representations simplify

Why it matters beyond the toy

Grokking suggests that 'more training' can sometimes qualitatively change a model's behavior — not just improve a score but switch to a different algorithm internally. That has implications for how we evaluate safety during training.

Scores can plateau, then jump, without any architectural change
Early-stopped models can be qualitatively different from fully-trained ones
Training dynamics (not just final weights) matter for capability forecasting
Mechanistic interpretability is the tool that catches these transitions

We show that neural networks can 'grok' algorithmic tasks, generalizing well after overfitting the training set.
— Power et al., Grokking paper (2022)

The big idea: learning is not monotonic. Grokking proves that long after training looks 'done,' the internal algorithm can still be changing.

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-creators-grokking

What is the core idea behind "Grokking: Learning That Snaps Into Place"?
1. Sometimes a network memorizes, then — long after you would have stopped training — suddenly generalizes. That is grokking, a real and weird phenomenon. Why it matters beyond the toy Grokking suggests that 'more training' can sometimes qualitatively change a model's behavior — not just improve a score but switch to a different algorithm internally.
2. Store every eval run in version control for trend analysis
3. Pay at least $15-20/hour — low pay corrupts results
4. Errors are disproportionately label noise in the dataset itself
Which term best describes a foundational idea in "Grokking: Learning That Snaps Into Place"?
1. generalization
2. grokking
3. double descent
4. weight decay
A learner studying Grokking: Learning That Snaps Into Place would need to understand which concept?
1. grokking
2. double descent
3. generalization
4. weight decay
Which of these is directly relevant to Grokking: Learning That Snaps Into Place?
1. grokking
2. generalization
3. weight decay
4. double descent
Which of the following is a key point about Grokking: Learning That Snaps Into Place?
1. Early: the model memorizes training examples, complex internal representations
2. Later: weight decay slowly pressures the model toward simpler solutions
3. At the transition: an algorithmic circuit forms that actually generalizes
4. After: performance generalizes, internal representations simplify
Which of these does NOT belong in a discussion of Grokking: Learning That Snaps Into Place?
1. Early: the model memorizes training examples, complex internal representations
2. Later: weight decay slowly pressures the model toward simpler solutions
3. At the transition: an algorithmic circuit forms that actually generalizes
4. Store every eval run in version control for trend analysis
Which statement is accurate regarding Grokking: Learning That Snaps Into Place?
1. Early-stopped models can be qualitatively different from fully-trained ones
2. Training dynamics (not just final weights) matter for capability forecasting
3. Scores can plateau, then jump, without any architectural change
4. Mechanistic interpretability is the tool that catches these transitions
Which of these does NOT belong in a discussion of Grokking: Learning That Snaps Into Place?
1. Training dynamics (not just final weights) matter for capability forecasting
2. Early-stopped models can be qualitatively different from fully-trained ones
3. Store every eval run in version control for trend analysis
4. Scores can plateau, then jump, without any architectural change
What is the key insight about "Mechanistic interpretability found the circuit" in the context of Grokking: Learning That Snaps Into Place?
1. For modular arithmetic, researchers later showed the grokked network implements a specific Fourier-transform-based algor…
2. Store every eval run in version control for trend analysis
3. Pay at least $15-20/hour — low pay corrupts results
4. Errors are disproportionately label noise in the dataset itself
What is the key insight about "It is not universal" in the context of Grokking: Learning That Snaps Into Place?
1. Store every eval run in version control for trend analysis
2. Grokking has been reproduced clearly on small, well-controlled tasks.
3. Pay at least $15-20/hour — low pay corrupts results
4. Errors are disproportionately label noise in the dataset itself
What is the recommended tip about "Ground your practice in fundamentals" in the context of Grokking: Learning That Snaps Into Place?
1. Store every eval run in version control for trend analysis
2. Pay at least $15-20/hour — low pay corrupts results
3. Every AI capability has an underlying mechanism. Understanding that mechanism tells you where it'll fail — which is more…
4. Errors are disproportionately label noise in the dataset itself
Which statement accurately describes an aspect of Grokking: Learning That Snaps Into Place?
1. Store every eval run in version control for trend analysis
2. Pay at least $15-20/hour — low pay corrupts results
3. Errors are disproportionately label noise in the dataset itself
4. In 2022, Power, Burda, Edwards, and colleagues at OpenAI reported something strange: a small transformer trained on modular arithmetic would…
What does working with Grokking: Learning That Snaps Into Place typically involve?
1. Grokking suggests that 'more training' can sometimes qualitatively change a model's behavior — not just improve a score but switch to a diff…
2. Store every eval run in version control for trend analysis
3. Pay at least $15-20/hour — low pay corrupts results
4. Errors are disproportionately label noise in the dataset itself
Which of the following is true about Grokking: Learning That Snaps Into Place?
1. Store every eval run in version control for trend analysis
2. The big idea: learning is not monotonic. Grokking proves that long after training looks 'done,' the internal algorithm can still be changing.
3. Pay at least $15-20/hour — low pay corrupts results
4. Errors are disproportionately label noise in the dataset itself
Which best describes the scope of "Grokking: Learning That Snaps Into Place"?
1. It is unrelated to foundations workflows
2. It applies only to the opposite beginner tier
3. It focuses on Sometimes a network memorizes, then — long after you would have stopped training — suddenly generali
4. It was deprecated in 2024 and no longer relevant

← Back to interactive lesson

Tendril · Creators · AI Foundations

Grokking: Learning That Snaps Into Place

40 min · Reviewed 2026

A Weird Training Curve

The classic curve

Accuracy
 1.0 |   Train ______________________
     |        /
     |       /         Test
 0.5 |      /          ________
     |     /          /
 0.0 |____/__________/______________ Time
     memorize      generalize
     (early)       (much later)Training accuracy saturates. Test accuracy stays low — then snaps up far later.

What is actually happening

Early: the model memorizes training examples, complex internal representations
Later: weight decay slowly pressures the model toward simpler solutions
At the transition: an algorithmic circuit forms that actually generalizes
After: performance generalizes, internal representations simplify

Why it matters beyond the toy

Scores can plateau, then jump, without any architectural change
Early-stopped models can be qualitatively different from fully-trained ones
Training dynamics (not just final weights) matter for capability forecasting
Mechanistic interpretability is the tool that catches these transitions

We show that neural networks can 'grok' algorithmic tasks, generalizing well after overfitting the training set.
— Power et al., Grokking paper (2022)

The big idea: learning is not monotonic. Grokking proves that long after training looks 'done,' the internal algorithm can still be changing.

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-creators-grokking

What is the core idea behind "Grokking: Learning That Snaps Into Place"?
1. Sometimes a network memorizes, then — long after you would have stopped training — suddenly generalizes. That is grokking, a real and weird phenomenon. Why it matters beyond the toy Grokking suggests that 'more training' can sometimes qualitatively change a model's behavior — not just improve a score but switch to a different algorithm internally.
2. Store every eval run in version control for trend analysis
3. Pay at least $15-20/hour — low pay corrupts results
4. Errors are disproportionately label noise in the dataset itself
Which term best describes a foundational idea in "Grokking: Learning That Snaps Into Place"?
1. generalization
2. grokking
3. double descent
4. weight decay
A learner studying Grokking: Learning That Snaps Into Place would need to understand which concept?
1. grokking
2. double descent
3. generalization
4. weight decay
Which of these is directly relevant to Grokking: Learning That Snaps Into Place?
1. grokking
2. generalization
3. weight decay
4. double descent
Which of the following is a key point about Grokking: Learning That Snaps Into Place?
1. Early: the model memorizes training examples, complex internal representations
2. Later: weight decay slowly pressures the model toward simpler solutions
3. At the transition: an algorithmic circuit forms that actually generalizes
4. After: performance generalizes, internal representations simplify
Which of these does NOT belong in a discussion of Grokking: Learning That Snaps Into Place?
1. Early: the model memorizes training examples, complex internal representations
2. Later: weight decay slowly pressures the model toward simpler solutions
3. At the transition: an algorithmic circuit forms that actually generalizes
4. Store every eval run in version control for trend analysis
Which statement is accurate regarding Grokking: Learning That Snaps Into Place?
1. Early-stopped models can be qualitatively different from fully-trained ones
2. Training dynamics (not just final weights) matter for capability forecasting
3. Scores can plateau, then jump, without any architectural change
4. Mechanistic interpretability is the tool that catches these transitions
Which of these does NOT belong in a discussion of Grokking: Learning That Snaps Into Place?
1. Training dynamics (not just final weights) matter for capability forecasting
2. Early-stopped models can be qualitatively different from fully-trained ones
3. Store every eval run in version control for trend analysis
4. Scores can plateau, then jump, without any architectural change
What is the key insight about "Mechanistic interpretability found the circuit" in the context of Grokking: Learning That Snaps Into Place?
1. For modular arithmetic, researchers later showed the grokked network implements a specific Fourier-transform-based algor…
2. Store every eval run in version control for trend analysis
3. Pay at least $15-20/hour — low pay corrupts results
4. Errors are disproportionately label noise in the dataset itself
What is the key insight about "It is not universal" in the context of Grokking: Learning That Snaps Into Place?
1. Store every eval run in version control for trend analysis
2. Grokking has been reproduced clearly on small, well-controlled tasks.
3. Pay at least $15-20/hour — low pay corrupts results
4. Errors are disproportionately label noise in the dataset itself
What is the recommended tip about "Ground your practice in fundamentals" in the context of Grokking: Learning That Snaps Into Place?
1. Store every eval run in version control for trend analysis
2. Pay at least $15-20/hour — low pay corrupts results
3. Every AI capability has an underlying mechanism. Understanding that mechanism tells you where it'll fail — which is more…
4. Errors are disproportionately label noise in the dataset itself
Which statement accurately describes an aspect of Grokking: Learning That Snaps Into Place?
1. Store every eval run in version control for trend analysis
2. Pay at least $15-20/hour — low pay corrupts results
3. Errors are disproportionately label noise in the dataset itself
4. In 2022, Power, Burda, Edwards, and colleagues at OpenAI reported something strange: a small transformer trained on modular arithmetic would…
What does working with Grokking: Learning That Snaps Into Place typically involve?
1. Grokking suggests that 'more training' can sometimes qualitatively change a model's behavior — not just improve a score but switch to a diff…
2. Store every eval run in version control for trend analysis
3. Pay at least $15-20/hour — low pay corrupts results
4. Errors are disproportionately label noise in the dataset itself
Which of the following is true about Grokking: Learning That Snaps Into Place?
1. Store every eval run in version control for trend analysis
2. The big idea: learning is not monotonic. Grokking proves that long after training looks 'done,' the internal algorithm can still be changing.
3. Pay at least $15-20/hour — low pay corrupts results
4. Errors are disproportionately label noise in the dataset itself
Which best describes the scope of "Grokking: Learning That Snaps Into Place"?
1. It is unrelated to foundations workflows
2. It applies only to the opposite beginner tier
3. It focuses on Sometimes a network memorizes, then — long after you would have stopped training — suddenly generali
4. It was deprecated in 2024 and no longer relevant

← Back to interactive lesson