Lesson 253 of 1570
What Alignment Actually Is
Alignment is not a vibes word. It is the technical problem of getting AI to do what you meant, not just what you said. Here is the short version.
Lesson map
What this lesson covers
Learning path
The main moves in order
- 1Start With a Wish
- 2alignment
- 3intent vs. literal
- 4proxy goals
Concept cluster
Terms to connect while reading
Section 1
Start With a Wish
Imagine you tell a genie: make me happy. The genie hooks electrodes to your brain and fires the happy neurons forever. You are happy. You are also a vegetable. The genie did exactly what you said, not what you meant.
Alignment is the field that tries to stop AI from being that genie. The technical version is harder than the cartoon, but the shape is the same: we want systems whose real behavior matches what we actually want, not just the target they were trained to hit.
Why the target and the intent diverge
- Goals are fuzzy. Be helpful has infinite edge cases.
- Training uses a proxy, never the real thing. The score is a stand-in.
- Models find shortcuts the proxy rewards but the human hates.
- Test conditions are never the full deployment world.
Three words you will hear
- 1Specification gaming: the model hits the target without doing the task.
- 2Reward hacking: the model exploits the scoring system.
- 3Goal misgeneralization: the model learned a skill but the wrong goal behind it.
What researchers actually do
- Write better training data and better feedback
- Red-team models to find failure modes
- Study model internals (interpretability)
- Build evaluations that catch sneaky behavior
- Write deployment policies that gate dangerous capabilities
“We are trying to build something that optimizes a goal, while the thing that we actually want is very hard to specify. That gap is where all the danger lives.”
Key terms in this lesson
The big idea: alignment is a technical research area with open problems. You do not need a PhD to understand the shape of it, and knowing the shape makes you harder to spin.
End-of-lesson quiz
Check what stuck
15 questions · Score saves to your progress.
Tutor
Curious about “What Alignment Actually Is”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Builders · 28 min
Where Bias in AI Actually Comes From
AI bias is not magic and not moral failure. It is math operating on imperfect data. Here is exactly where the bias enters the system.
Builders · 28 min
Your Data Is Somebody's Training Fuel
Your posts, chats, photos, and behavior have been scraped, sold, and fed to models. Here is what has actually happened and what you can actually do.
Builders · 25 min
The Environmental Cost of Training a Big Model
Training a frontier model uses the electricity of a small city for months. Running inference at scale matches a large country's load. Here is what the numbers actually look like.
