Lesson 189 of 1455
What a Benchmark Is and Why It Matters
Benchmarks are how AI progress gets measured. Understanding them is the first step in reading any AI claim.
Builders · AI Foundations · ~13 min read
Standardized Tests for Machines
A benchmark is a fixed set of problems with known correct answers. Every model attempts the same set. The score is the percentage correct. That is it. No magic.
Why they matter
- They make different models directly comparable
- They give researchers a common language for progress
- They create pressure to improve in measurable ways
- They let outsiders check claims of capability
The life cycle of a benchmark
- 1Launch: released with a baseline score
- 2Climb: research community races to top it
- 3Saturation: scores approach the human or ceiling
- 4Retirement: it stops being useful, a harder one replaces it
“When a measure becomes a target, it ceases to be a good measure.”
Key terms in this lesson
The big idea: benchmarks are measuring sticks, not finish lines. Treat scores as clues, not verdicts.
End-of-lesson quiz
Check what stuck
8 questions · Score saves to your progress.
Lesson help
Questions are best handled with a grown-up here.
For this age range, Tendril keeps freeform AI chat paused until parent/guardian consent and child-safe moderation are fully verified. Use the quiz, notes, and related lessons below, or ask a parent, guardian, teacher, or librarian to work through the question with you.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Builders · 22 min
The Turing Test and Its Discontents
The imitation game became famous, but most AI researchers now think it measures the wrong thing.
Builders · 25 min
Benchmarks, Leaderboards, and Their Limits
Every new model claims a new high score. Before you trust a leaderboard, learn what benchmarks actually measure — and what they miss.
Explorers · 12 min
Why AI Tests Are Tricky
People give AIs tests called benchmarks. But passing a test is not the same as being truly smart. Let's find out why.
