Logit Lens: Peeking at Predictions Mid-Forward-Pass

A transformer processes a token through many layers before outputting a prediction. The logit lens shows you what the model would predict if it stopped at each layer along the way.

BuildersEthics & Society~15 min readIntermediateAdvancedCoderOperationsBI5 · Societal ImpactBI3 · LearningPrint / PDF

Lesson map

What this lesson covers

25 min12 blocks3 concepts

Learning path

The main moves in order

1A Diagnostic Probe for the Residual Stream
2logit lens
3residual stream
4layer

Concept cluster

Terms to connect while reading

logit lensresidual streamlayer

Sections3

Lists2

Notes4

Terms1

Section 1

A Diagnostic Probe for the Residual Stream

Transformers build up predictions layer by layer. Each layer reads the residual stream — a running hidden state — and writes a correction. The logit lens technique, popularized by a 2020 LessWrong post by nostalgebraist, is to apply the model's final unembedding matrix to intermediate layers, as if prediction happened there.

What you see

Early layers: predictions close to the input token or simple patterns
Middle layers: predictions related to the general category or topic
Later layers: predictions refine toward the correct next token
Near the end: the final answer crystallizes

Check-in 1. Got it so far?

Variants and refinements

1Tuned lens (Belrose et al. 2023): train small translators per layer for more accurate reading
2Logit difference: compare predictions for target vs. distractor tokens
3Direct logit attribution: decompose which components contributed to a prediction
4Patchscopes: use a stronger model to interpret activations from a weaker one

Key terms in this lesson

Check-in 2. Got it so far?

The big idea: the logit lens is one of the cheapest interpretability tools. One line of code gives you a new window into how a transformer thinks.

End-of-lesson quiz

Check what stuck

15 questions · Score saves to your progress.

Tutor

Curious about “Logit Lens: Peeking at Predictions Mid-Forward-Pass”?

Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.

Progress saved locally in this browser. Sign in to sync across devices.

Related lessons

Logit Lens: Peeking at Predictions Mid-Forward-Pass

A Diagnostic Probe for the Residual Stream

What you see

Variants and refinements

Curious about “Logit Lens: Peeking at Predictions Mid-Forward-Pass”?

Keep going

Logit Lens: Peeking at Predictions Mid-Forward-Pass

A Diagnostic Probe for the Residual Stream

What you see

Variants and refinements

Curious about “Logit Lens: Peeking at Predictions Mid-Forward-Pass”?

Keep going