Tendril

Lesson 250 of 2116

Elo Ratings for AI

Born in chess, now everywhere in AI evaluation. Learn why Elo works and where it quietly misleads.

CreatorsAI Foundations~19 min readAdvancedMarketerBI3 · LearningPrint / PDF

Lesson map

What this lesson covers

32 min14 blocks4 concepts

Learning path

The main moves in order

1A System From 1960 Chess
2Elo
3pairwise comparison
4logistic curve

Concept cluster

Terms to connect while reading

Elopairwise comparisonlogistic curveuncertainty

Sections3

Lists2

Notes4

Compare1

Quotes1

Section 1

A System From 1960 Chess

Arpad Elo invented his rating system in 1960 for the US Chess Federation. The math is a logistic curve: the probability player A beats player B is a smooth function of their rating difference. 400 points of gap means roughly a 91 percent win rate.

Key properties

Only the difference matters, not the absolute rating
Ratings update after every game, scaled by expectation
Beating a stronger opponent earns more points
Losses to weaker opponents cost more points
Over many games, rating converges to a stable estimate

Check-in 1. Got it so far?

Where Elo breaks for AI

1Skill is not one-dimensional — a model great at coding and bad at poetry cannot be summarized as one number
2Non-transitive preferences exist (A beats B, B beats C, C beats A) and Elo cannot represent them
3Rating inflation as new strong models enter the pool
4Limited ability to compare models that never played each other

Compare the options

Elo strength	Elo weakness
Simple to compute	Assumes single-dimensional skill
Updates online	Needs many games to stabilize
Human-interpretable	Ignores task differences
Widely familiar	Hides uncertainty in a single number

Check-in 2. Got it so far?

“The rating system is not a moral judgment but a best-guess estimate of relative strength.”
Arpad Elo, The Rating of Chessplayers, Past and Present (1978)

Key terms in this lesson

The big idea: Elo is a compact, elegant way to rank competitors — but a single number hides a lot. Always look at the interval and the category breakdown.

Check-in 3. Got it so far?

End-of-lesson quiz

Check what stuck

15 questions · Score saves to your progress.

Tutor

Curious about “Elo Ratings for AI”?

Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.

Progress saved locally in this browser. Sign in to sync across devices.

Related lessons

Elo Ratings for AI

A System From 1960 Chess

Key properties

Where Elo breaks for AI

Curious about “Elo Ratings for AI”?

Keep going

Elo Ratings for AI

A System From 1960 Chess

Key properties

Where Elo breaks for AI

Curious about “Elo Ratings for AI”?

Keep going