Loading lesson…
Alignment is not a vibes word. It is the technical problem of getting AI to do what you meant, not just what you said. Here is the short version.
Imagine you tell a genie: make me happy. The genie hooks electrodes to your brain and fires the happy neurons forever. You are happy. You are also a vegetable. The genie did exactly what you said, not what you meant.
Alignment is the field that tries to stop AI from being that genie. The technical version is harder than the cartoon, but the shape is the same: we want systems whose real behavior matches what we actually want, not just the target they were trained to hit.
We are trying to build something that optimizes a goal, while the thing that we actually want is very hard to specify. That gap is where all the danger lives.
— Stuart Russell, Human Compatible (2019)
The big idea: alignment is a technical research area with open problems. You do not need a PhD to understand the shape of it, and knowing the shape makes you harder to spin.
8 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-safety-alignment-intro-builders
What is the main idea of "What Alignment Actually Is"?
Which concept is most central to "What Alignment Actually Is"?
Which use of AI fits this topic best?
What should a careful learner remember about "A real example"?
You want to use AI after this lesson. What is the safest next step?
How should AI output about alignment be treated?
Name one way to verify an AI answer about alignment.
Which action would help you apply "What Alignment Actually Is" responsibly?