Lesson 179 of 1455
Logit Lens: Peeking at Predictions Mid-Forward-Pass
A transformer processes a token through many layers before outputting a prediction. The logit lens shows you what the model would predict if it stopped at each layer along the way.
Builders · Ethics & Society · ~15 min read
A Diagnostic Probe for the Residual Stream
Transformers build up predictions layer by layer. Each layer reads the residual stream — a running hidden state — and writes a correction. The logit lens technique, popularized by a 2020 LessWrong post by nostalgebraist, is to apply the model's final unembedding matrix to intermediate layers, as if prediction happened there.
What you see
- Early layers: predictions close to the input token or simple patterns
- Middle layers: predictions related to the general category or topic
- Later layers: predictions refine toward the correct next token
- Near the end: the final answer crystallizes
Variants and refinements
- 1Tuned lens (Belrose et al. 2023): train small translators per layer for more accurate reading
- 2Logit difference: compare predictions for target vs. distractor tokens
- 3Direct logit attribution: decompose which components contributed to a prediction
- 4Patchscopes: use a stronger model to interpret activations from a weaker one
Key terms in this lesson
The big idea: the logit lens is one of the cheapest interpretability tools. One line of code gives you a new window into how a transformer thinks.
End-of-lesson quiz
Check what stuck
8 questions · Score saves to your progress.
Lesson help
Questions are best handled with a grown-up here.
For this age range, Tendril keeps freeform AI chat paused until parent/guardian consent and child-safe moderation are fully verified. Use the quiz, notes, and related lessons below, or ask a parent, guardian, teacher, or librarian to work through the question with you.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Builders · 28 min
Your Data Is Somebody's Training Fuel
Your posts, chats, photos, and behavior have been scraped, sold, and fed to models. Here is what has actually happened and what you can actually do.
Builders · 28 min
Circuits in Neural Networks
A circuit is a small sub-network inside a big model that implements one specific behavior. Finding circuits is how researchers prove how a model does what it does.
Builders · 25 min
Compute Thresholds: Regulating by FLOPs
Almost every AI regulation uses training compute as a trigger. 10^25 here, 10^26 there. Why compute, and why those numbers?
