Tendril

Lesson 178 of 1570

Circuits in Neural Networks

A circuit is a small sub-network inside a big model that implements one specific behavior. Finding circuits is how researchers prove how a model does what it does.

BuildersEthics & Society~17 min readIntermediateAdvancedResearcherBI5 · Societal ImpactBI3 · LearningPrint / PDF

Lesson map

What this lesson covers

28 min12 blocks3 concepts

Learning path

The main moves in order

1From Features to Circuits
2circuit
3attention head
4interpretability

Concept cluster

Terms to connect while reading

circuitattention headinterpretability

Sections3

Lists2

Notes4

Terms1

Section 1

From Features to Circuits

Finding features tells you what a model represents. Circuits tell you how it computes. A circuit is a specific subset of attention heads and MLP components that, together, implement a particular capability.

Famous examples

Induction heads: detect 'A B ... A' and predict 'B' next, enabling in-context learning
IOI circuit: identifies the indirect object in sentences like 'John and Mary went to the store; John gave a drink to ___'
Modular addition circuit: a small transformer that computes (a+b) mod p using rotations in a Fourier basis
Greater-than circuit: determines which of two numbers is larger

Check-in 1. Got it so far?

Why circuits matter for safety

1A circuit-level understanding could reveal deceptive reasoning as it happens
2Circuits for sycophancy or refusal can be audited directly
3Removing a circuit can ablate a capability without full retraining
4Circuits that generalize across models are candidates for universal interpretability claims

Key terms in this lesson

Check-in 2. Got it so far?

The big idea: circuits are the wiring diagrams of neural networks. We can draw a few of them. We cannot yet draw most. That asymmetry is the state of the art.

End-of-lesson quiz

Check what stuck

15 questions · Score saves to your progress.

Tutor

Curious about “Circuits in Neural Networks”?

Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.

Progress saved locally in this browser. Sign in to sync across devices.

Related lessons

Circuits in Neural Networks

From Features to Circuits

Famous examples

Why circuits matter for safety

Curious about “Circuits in Neural Networks”?

Keep going

Circuits in Neural Networks

From Features to Circuits

Famous examples

Why circuits matter for safety

Curious about “Circuits in Neural Networks”?

Keep going