Loading lesson…
Probing asks a simple question: given a model's hidden state, can a small classifier predict some property? The answer tells you what the model represents, whether or not it uses that information.
Take a labeled dataset — sentences marked 'positive sentiment' vs 'negative,' or 'factual' vs 'hallucinated.' Extract the model's hidden activations on each input. Train a small classifier (a probe) to predict the label from the activations. If it succeeds, the model represents that information somewhere.
Probing tells you what is in the water. It does not tell you what the fish is doing.
— Anna Rogers, on interpretability research (paraphrased)
The big idea: probing is the cheapest, oldest tool in the interpretability kit. It is also still one of the most useful, especially when paired with intervention experiments.
15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-safety2-probing-creators
What is the core idea behind "Probing: Linear, Nonlinear, and Contrast"?
Which term best describes a foundational idea in "Probing: Linear, Nonlinear, and Contrast"?
A learner studying Probing: Linear, Nonlinear, and Contrast would need to understand which concept?
Which of these is directly relevant to Probing: Linear, Nonlinear, and Contrast?
Which of the following is a key point about Probing: Linear, Nonlinear, and Contrast?
Which of these does NOT belong in a discussion of Probing: Linear, Nonlinear, and Contrast?
Which statement is accurate regarding Probing: Linear, Nonlinear, and Contrast?
Which of these does NOT belong in a discussion of Probing: Linear, Nonlinear, and Contrast?
What is the key insight about "Why linear matters most" in the context of Probing: Linear, Nonlinear, and Contrast?
What is the key insight about "The probing pitfall" in the context of Probing: Linear, Nonlinear, and Contrast?
Which statement accurately describes an aspect of Probing: Linear, Nonlinear, and Contrast?
What does working with Probing: Linear, Nonlinear, and Contrast typically involve?
Which best describes the scope of "Probing: Linear, Nonlinear, and Contrast"?
Which section heading best belongs in a lesson about Probing: Linear, Nonlinear, and Contrast?
Which section heading best belongs in a lesson about Probing: Linear, Nonlinear, and Contrast?