Loading lesson…
The world's most influential 'leaderboard' for AI is not a test — it is humans voting blindly. Here is how that works.
Chatbot Arena, run by LMSYS Org (now often branded LMArena), is a website where you type a prompt, two anonymous models respond, and you vote on which is better. After millions of votes, a ranking emerges. It is harder to game than any fixed benchmark because the test set is whatever real people happen to ask.
Arena uses the Elo rating system from chess. Each model starts at 1000. When model A beats model B, A's score rises and B's falls, with the change scaled by how surprising the outcome was. Over millions of games, ratings converge to a stable ranking.
Simplified Elo update:
Expected(A vs B) = 1 / (1 + 10^((Rb - Ra)/400))
New Ra = Ra + K * (actual - expected)
K is usually 16-32. Beating a higher-rated opponent
earns more points than beating a weaker one.Elo rating in one paragraph of mathWe collect over 100,000 pairwise votes to analyze the strengths and weaknesses of various LLMs.
— Chiang et al., LMSYS Chatbot Arena paper (2024)
The big idea: Arena measures what people like, not what is true. That makes it an excellent signal for chat assistants and a poor one for correctness-critical work.
15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-creators-chatbot-arena
What is the core idea behind "How Chatbot Arena Works"?
Which term best describes a foundational idea in "How Chatbot Arena Works"?
A learner studying How Chatbot Arena Works would need to understand which concept?
Which of these is directly relevant to How Chatbot Arena Works?
Which of the following is a key point about How Chatbot Arena Works?
Which of these does NOT belong in a discussion of How Chatbot Arena Works?
Which statement is accurate regarding How Chatbot Arena Works?
Which of these does NOT belong in a discussion of How Chatbot Arena Works?
What is the key insight about "Categories matter" in the context of How Chatbot Arena Works?
What is the key insight about "Style over substance" in the context of How Chatbot Arena Works?
What is the recommended tip about "Ground your practice in fundamentals" in the context of How Chatbot Arena Works?
Which statement accurately describes an aspect of How Chatbot Arena Works?
What does working with How Chatbot Arena Works typically involve?
Which of the following is true about How Chatbot Arena Works?
Which best describes the scope of "How Chatbot Arena Works"?