Logit Lens: Peeking at Predictions Mid-Forward-Pass
A transformer processes a token through many layers before outputting a prediction. The logit lens shows you what the model would predict if it stopped at each layer along the way.
25 min · Reviewed 2026
A Diagnostic Probe for the Residual Stream
Transformers build up predictions layer by layer. Each layer reads the residual stream — a running hidden state — and writes a correction. The logit lens technique, popularized by a 2020 LessWrong post by nostalgebraist, is to apply the model's final unembedding matrix to intermediate layers, as if prediction happened there.
What you see
Early layers: predictions close to the input token or simple patterns
Middle layers: predictions related to the general category or topic
Later layers: predictions refine toward the correct next token
Near the end: the final answer crystallizes
Variants and refinements
Tuned lens (Belrose et al. 2023): train small translators per layer for more accurate reading
Logit difference: compare predictions for target vs. distractor tokens
Direct logit attribution: decompose which components contributed to a prediction
Patchscopes: use a stronger model to interpret activations from a weaker one
The big idea: the logit lens is one of the cheapest interpretability tools. One line of code gives you a new window into how a transformer thinks.
End-of-lesson check
8 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-safety2-logit-lens-builders
What is the main idea of "Logit Lens: Peeking at Predictions Mid-Forward-Pass"?
A transformer processes a token through many layers before outputting a prediction.
Use AI as the final authority for the whole decision
Avoid checking the answer once it sounds polished
Focus only on speed instead of judgment
Which concept is most central to "Logit Lens: Peeking at Predictions Mid-Forward-Pass"?
residual stream
logit lens
layer
unembedding
Which use of AI fits this topic best?
Let the AI decide what matters without your review
Use the answer before checking whether it fits the situation
Early layers: predictions close to the input token or simple patterns
Use the first answer without checking it
What should a careful learner remember about "Why this is useful"?
Use AI to draft or organize ideas about logit lens, then verify before acting.
Skip the context so the tool can guess faster
Treat the output as private even after sharing it online
Use the answer without checking the source
You want to use AI after this lesson. What is the safest next step?
Act immediately because the AI answer is written clearly
AI cannot make the human values decision for you.
Hide uncertainty so the final answer looks cleaner
Use private or sensitive details before checking permission
How should AI output about logit lens be treated?
As proof that no other source is needed
As a replacement for context, consent, or expert review
As a draft or helper output that still needs human judgment and verification
As something that becomes correct when it sounds confident
Name one way to verify an AI answer about logit lens.
Which action would help you apply "Logit Lens: Peeking at Predictions Mid-Forward-Pass" responsibly?
Use the tool to avoid thinking through the tradeoff
Keep going even if the output conflicts with a trusted source
Use the first answer without checking it
Middle layers: predictions related to the general category or topic