Lesson 179 of 1570
Logit Lens: Peeking at Predictions Mid-Forward-Pass
A transformer processes a token through many layers before outputting a prediction. The logit lens shows you what the model would predict if it stopped at each layer along the way.
Lesson map
What this lesson covers
Learning path
The main moves in order
- 1A Diagnostic Probe for the Residual Stream
- 2logit lens
- 3residual stream
- 4layer
Concept cluster
Terms to connect while reading
Section 1
A Diagnostic Probe for the Residual Stream
Transformers build up predictions layer by layer. Each layer reads the residual stream — a running hidden state — and writes a correction. The logit lens technique, popularized by a 2020 LessWrong post by nostalgebraist, is to apply the model's final unembedding matrix to intermediate layers, as if prediction happened there.
What you see
- Early layers: predictions close to the input token or simple patterns
- Middle layers: predictions related to the general category or topic
- Later layers: predictions refine toward the correct next token
- Near the end: the final answer crystallizes
Variants and refinements
- 1Tuned lens (Belrose et al. 2023): train small translators per layer for more accurate reading
- 2Logit difference: compare predictions for target vs. distractor tokens
- 3Direct logit attribution: decompose which components contributed to a prediction
- 4Patchscopes: use a stronger model to interpret activations from a weaker one
Key terms in this lesson
The big idea: the logit lens is one of the cheapest interpretability tools. One line of code gives you a new window into how a transformer thinks.
End-of-lesson quiz
Check what stuck
15 questions · Score saves to your progress.
Tutor
Curious about “Logit Lens: Peeking at Predictions Mid-Forward-Pass”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Builders · 28 min
Your Data Is Somebody's Training Fuel
Your posts, chats, photos, and behavior have been scraped, sold, and fed to models. Here is what has actually happened and what you can actually do.
Builders · 28 min
Circuits in Neural Networks
A circuit is a small sub-network inside a big model that implements one specific behavior. Finding circuits is how researchers prove how a model does what it does.
Builders · 25 min
Compute Thresholds: Regulating by FLOPs
Almost every AI regulation uses training compute as a trigger. 10^25 here, 10^26 there. Why compute, and why those numbers?
