Loading lesson…
Alignment is not a vibes word. It is the technical problem of getting AI to do what you meant, not just what you said. Here is the short version.
Imagine you tell a genie: make me happy. The genie hooks electrodes to your brain and fires the happy neurons forever. You are happy. You are also a vegetable. The genie did exactly what you said, not what you meant.
Alignment is the field that tries to stop AI from being that genie. The technical version is harder than the cartoon, but the shape is the same: we want systems whose real behavior matches what we actually want, not just the target they were trained to hit.
We are trying to build something that optimizes a goal, while the thing that we actually want is very hard to specify. That gap is where all the danger lives.
— Stuart Russell, Human Compatible (2019)
The big idea: alignment is a technical research area with open problems. You do not need a PhD to understand the shape of it, and knowing the shape makes you harder to spin.
15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-safety-alignment-intro-builders
What is the core idea behind "What Alignment Actually Is"?
Which term best describes a foundational idea in "What Alignment Actually Is"?
A learner studying What Alignment Actually Is would need to understand which concept?
Which of these is directly relevant to What Alignment Actually Is?
Which of the following is a key point about What Alignment Actually Is?
Which of these does NOT belong in a discussion of What Alignment Actually Is?
Which statement is accurate regarding What Alignment Actually Is?
Which of these correctly reflects a principle in What Alignment Actually Is?
Which of these does NOT belong in a discussion of What Alignment Actually Is?
What is the key insight about "A real example" in the context of What Alignment Actually Is?
What is the key insight about "Alignment is not solved" in the context of What Alignment Actually Is?
Which statement accurately describes an aspect of What Alignment Actually Is?
What does working with What Alignment Actually Is typically involve?
Which of the following is true about What Alignment Actually Is?
Which best describes the scope of "What Alignment Actually Is"?