A circuit is a small sub-network inside a big model that implements one specific behavior. Finding circuits is how researchers prove how a model does what it does.
28 min · Reviewed 2026
From Features to Circuits
Finding features tells you what a model represents. Circuits tell you how it computes. A circuit is a specific subset of attention heads and MLP components that, together, implement a particular capability.
Famous examples
Induction heads: detect 'A B A' and predict 'B' next, enabling in-context learning
IOI circuit: identifies the indirect object in sentences like 'John and Mary went to the store; John gave a drink to ___'
Modular addition circuit: a small transformer that computes (a+b) mod p using rotations in a Fourier basis
Greater-than circuit: determines which of two numbers is larger
Why circuits matter for safety
A circuit-level understanding could reveal deceptive reasoning as it happens
Circuits for sycophancy or refusal can be audited directly
Removing a circuit can ablate a capability without full retraining
Circuits that generalize across models are candidates for universal interpretability claims
The big idea: circuits are the wiring diagrams of neural networks. We can draw a few of them. We cannot yet draw most. That asymmetry is the state of the art.
End-of-lesson check
8 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-safety2-circuits-builders
What is the main idea of "Circuits in Neural Networks"?
A circuit is a small sub-network inside a big model that implements one specific behavior.
Use AI as the final authority for the whole decision
Avoid checking the answer once it sounds polished
Focus only on speed instead of judgment
Which concept is most central to "Circuits in Neural Networks"?
attention head
circuit
interpretability
ablation
Which use of AI fits this topic best?
Let the AI decide what matters without your review
Use the answer before checking whether it fits the situation
Induction heads: detect 'A B A' and predict 'B' next, enabling in-context learning
Use the first answer without checking it
What should a careful learner remember about "How circuits get discovered"?
Use AI to draft or organize ideas about circuit, then verify before acting.
Skip the context so the tool can guess faster
Treat the output as private even after sharing it online
Use the answer without checking the source
You want to use AI after this lesson. What is the safest next step?
Act immediately because the AI answer is written clearly
AI cannot make the human values decision for you.
Hide uncertainty so the final answer looks cleaner
Use private or sensitive details before checking permission
How should AI output about circuit be treated?
As proof that no other source is needed
As a replacement for context, consent, or expert review
As a draft or helper output that still needs human judgment and verification
As something that becomes correct when it sounds confident
Name one way to verify an AI answer about circuit.
Which action would help you apply "Circuits in Neural Networks" responsibly?
Use the tool to avoid thinking through the tradeoff
Keep going even if the output conflicts with a trusted source
Use the first answer without checking it
IOI circuit: identifies the indirect object in sentences like 'John and Mary went to the store; John gave a drink to ___'