Lesson 1440 of 1455
What AI Safety Research Actually Is
The field trying to make sure AI stays good for humans — explained for teens.
Builders · AI Foundations · ~4 min read
The big idea
AI safety is a serious technical field with researchers at every major lab and dozens of independent organizations. It's not just 'don't say bad words' — it's making sure increasingly capable AI systems do what humans actually want, and that we'd notice if they didn't. Several teen-accessible paths into this work exist now.
Some examples
- Alignment research: getting models to faithfully pursue human goals.
- Interpretability: understanding what's happening inside the model's neural network.
- Evaluations: building tests that catch dangerous capabilities before deployment.
- Governance: writing the rules nations and companies follow.
Try it!
Read the homepage of one AI safety org (Anthropic, ARC, Apollo, MATS) and note what jobs they hire for.
Key terms in this lesson
Practice this safely
Try this with a school, hobby, or family example where the stakes are low. Use the AI output as a draft you can question, not as the final answer.
- 1Ask AI to explain alignment in plain language, then underline anything that sounds uncertain or too broad.
- 2Give it one detail from "What AI Safety Research Actually Is" and ask for two possible next steps plus one reason each step might be wrong.
- 3Check interpretability against a trusted source, teacher, adult, expert, or original document before you use it.
End-of-lesson quiz
Check what stuck
8 questions · Score saves to your progress.
Lesson help
Questions are best handled with a grown-up here.
For this age range, Tendril keeps freeform AI chat paused until parent/guardian consent and child-safe moderation are fully verified. Use the quiz, notes, and related lessons below, or ask a parent, guardian, teacher, or librarian to work through the question with you.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Creators · 30 min
Sparse Autoencoders: Looking Inside an AI Model's Brain
Sparse autoencoders decompose model activations into interpretable features, opening the black box for safety and debugging.
Creators · 11 min
How AI Models Get Safety Training: RLHF in Plain Words
Why models refuse what they refuse, and how that shapes their behavior.
Builders · 30 min
The Supervised Learning Loop
Most modern AI is trained on a loop of guess, check, and adjust. Understand the loop and you understand the heart of machine learning.
