Probing: Linear, Nonlinear, and Contrast

Probing asks a simple question: given a model's hidden state, can a small classifier predict some property? The answer tells you what the model represents, whether or not it uses that information.

33 min · Reviewed 2026

The Probe Recipe

Take a labeled dataset — sentences marked 'positive sentiment' vs 'negative,' or 'factual' vs 'hallucinated.' Extract the model's hidden activations on each input. Train a small classifier (a probe) to predict the label from the activations. If it succeeds, the model represents that information somewhere.

Types of probes

Linear probe: a single linear layer — simplest and most trusted
Nonlinear probe: MLP or deeper — detects information the model doesn't use linearly
Contrast probe (CCS): trains on the constraint that contradictory statements should have opposite representations
Direction finding: not quite a probe, but similar — find a direction in activation space that maps to a concept

Classic findings

BERT encodes syntactic trees in its hidden states (Hewitt and Manning, 2019)
LLMs represent truth-vs-falsehood directions that persist across layers
Chess-playing transformers encode full board state in their activations
Many 'emergent' capabilities show up first in probes before appearing in output

Probing tells you what is in the water. It does not tell you what the fish is doing.
— Anna Rogers, on interpretability research (paraphrased)

The big idea: probing is the cheapest, oldest tool in the interpretability kit. It is also still one of the most useful, especially when paired with intervention experiments.

End-of-lesson check

8 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-safety2-probing-creators

What is the main idea of "Probing: Linear, Nonlinear, and Contrast"?
1. Probing asks a simple question: given a model's hidden state, can a small classifier predict some property?
2. Use AI as the final authority for the whole decision
3. Avoid checking the answer once it sounds polished
4. Focus only on speed instead of judgment
Which concept is most central to "Probing: Linear, Nonlinear, and Contrast"?
1. linear probe
2. probing
3. representation
4. probe
Which use of AI fits this topic best?
1. Let the AI decide what matters without your review
2. Use the answer before checking whether it fits the situation
3. Linear probe: a single linear layer — simplest and most trusted
4. Treat the AI output as automatically correct
What should a careful learner remember about "Why linear matters most"?
1. Use "Why linear matters most" as a reminder to verify the AI output before anyone relies on it.
2. Skip the context so the tool can guess faster
3. Treat the output as private even after sharing it online
4. Use the answer without checking the source
You want to use AI after this lesson. What is the safest next step?
1. Act immediately because the AI answer is written clearly
2. AI cannot make the human values decision for you.
3. Hide uncertainty so the final answer looks cleaner
4. Use private or sensitive details before checking permission
How should AI output about probing be treated?
1. As proof that no other source is needed
2. As a replacement for context, consent, or expert review
3. As a draft or helper output that still needs human judgment and verification
4. As something that becomes correct when it sounds confident
Name one way to verify an AI answer about probing.
Which action would help you apply "Probing: Linear, Nonlinear, and Contrast" responsibly?
1. Use the tool to avoid thinking through the tradeoff
2. Keep going even if the output conflicts with a trusted source
3. Treat the AI output as automatically correct
4. Nonlinear probe: MLP or deeper — detects information the model doesn't use linearly

← Back to interactive lesson

Tendril · Creators · Ethics & Society

Probing: Linear, Nonlinear, and Contrast

Probing asks a simple question: given a model's hidden state, can a small classifier predict some property? The answer tells you what the model represents, whether or not it uses that information.

33 min · Reviewed 2026

The Probe Recipe

Types of probes

Linear probe: a single linear layer — simplest and most trusted
Nonlinear probe: MLP or deeper — detects information the model doesn't use linearly
Contrast probe (CCS): trains on the constraint that contradictory statements should have opposite representations
Direction finding: not quite a probe, but similar — find a direction in activation space that maps to a concept

Classic findings

BERT encodes syntactic trees in its hidden states (Hewitt and Manning, 2019)
LLMs represent truth-vs-falsehood directions that persist across layers
Chess-playing transformers encode full board state in their activations
Many 'emergent' capabilities show up first in probes before appearing in output

Probing tells you what is in the water. It does not tell you what the fish is doing.
— Anna Rogers, on interpretability research (paraphrased)

The big idea: probing is the cheapest, oldest tool in the interpretability kit. It is also still one of the most useful, especially when paired with intervention experiments.

End-of-lesson check

8 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-safety2-probing-creators

What is the main idea of "Probing: Linear, Nonlinear, and Contrast"?
1. Probing asks a simple question: given a model's hidden state, can a small classifier predict some property?
2. Use AI as the final authority for the whole decision
3. Avoid checking the answer once it sounds polished
4. Focus only on speed instead of judgment
Which concept is most central to "Probing: Linear, Nonlinear, and Contrast"?
1. linear probe
2. probing
3. representation
4. probe
Which use of AI fits this topic best?
1. Let the AI decide what matters without your review
2. Use the answer before checking whether it fits the situation
3. Linear probe: a single linear layer — simplest and most trusted
4. Treat the AI output as automatically correct
What should a careful learner remember about "Why linear matters most"?
1. Use "Why linear matters most" as a reminder to verify the AI output before anyone relies on it.
2. Skip the context so the tool can guess faster
3. Treat the output as private even after sharing it online
4. Use the answer without checking the source
You want to use AI after this lesson. What is the safest next step?
1. Act immediately because the AI answer is written clearly
2. AI cannot make the human values decision for you.
3. Hide uncertainty so the final answer looks cleaner
4. Use private or sensitive details before checking permission
How should AI output about probing be treated?
1. As proof that no other source is needed
2. As a replacement for context, consent, or expert review
3. As a draft or helper output that still needs human judgment and verification
4. As something that becomes correct when it sounds confident
Name one way to verify an AI answer about probing.
Which action would help you apply "Probing: Linear, Nonlinear, and Contrast" responsibly?
1. Use the tool to avoid thinking through the tradeoff
2. Keep going even if the output conflicts with a trusted source
3. Treat the AI output as automatically correct
4. Nonlinear probe: MLP or deeper — detects information the model doesn't use linearly

← Back to interactive lesson