Live benchmark data — Apr 2026

Compare AI Models

Real benchmark scores from LMSYS Arena, GPQA Diamond, AIME 2025, SWE-bench, MMMLU, and Humanity's Last Exam. No marketing benchmarks.

#1 Overall — Overall Score

Claude Opus 4.7

Anthropic • 1M context • $5 / $25 per 1M

90.2

Overall score

11 models

Rank	Model	Overall	Arena Elo	GPQA Diamond	AIME 2025	SWE-bench	Humanity's Last Exam	MMMLU	Context	Model cost
#1	A Claude Opus 4.7 Anthropic • Apr 2026	90.2	1487	94.2%	98.5%	87.6%	42.1%	91.5%	1M	$5 / $25 per 1M
#2	O GPT-5.5 OpenAI • Apr 2026	89	1495	93.6%	97.2%	82.4%	41.4%	90.2%	256K	$5 / $30 per 1M
#3	G Gemini 3.1 Pro Google • Apr 2026	88.9	1487	91.9%	100%	78.3%	45.8%	91.8%	10M	$2 / $12 per 1M
#4	K Kimi K2.5 Thinking Kimi • Apr 2026	84	1445	88.5%	99.1%	72.1%	44.9%	88.2%	256K	$0.6 / $2.5 per 1M
#5	X Grok 4.20 xAI • Mar 2026	80.3	1456	86.3%	92.4%	68.5%	35.2%	86.7%	2M	$3 / $15 per 1M
#6	D DeepSeek R1 DeepSeek • Jan 2026	80	1424	85.7%	96.3%	71.2%	38.5%	85.1%	128K	$0.55 / $2.19 per 1M
#7	M Llama 4 Scout Meta • Nov 2025	66.2	1380	78.2%	72.5%	58.3%	22.1%	80.5%	10M	Free (self-host)
#8	M Mistral Large 3 Mistral • Feb 2026	64	1370	76.5%	68.2%	55.8%	20.5%	82.3%	128K	~$2 / ~$6 per 1M
#9	C Command R+ Cohere • 2025	41.8	1250	58.2%	35.4%	38.5%	8.2%	72.1%	128K	$2.5 / $10 per 1M
#10	P Sonar Pro Perplexity • 2025	37.6	1280	52.1%	28.3%	25.4%	5.1%	68.5%	128K	$20/mo Pro
#11	A Jamba 1.6 AI21 • 2025	29.6	1180	45.2%	22.1%	28.3%	4.5%	62.3%	256K	API / Custom

Claude Opus 4.7

Anthropic

90.2

#1 Coding#1 GPQA1M Context

1M • $5 / $25 per 1M

GPT-5.5

OpenAI

#1 Arena EloARC-AGI 85%Vision

256K • $5 / $30 per 1M

Gemini 3.1 Pro

Google

88.9

#1 AIME#1 HLE#1 MMMLU

10M • $2 / $12 per 1M

Kimi K2.5 Thinking

Kimi

Agent Swarm100 AgentsFast

256K • $0.6 / $2.5 per 1M

Grok 4.20

xAI

80.3

Real-time XUnfilteredFast

2M • $3 / $15 per 1M

DeepSeek R1

DeepSeek

MIT License20x CheaperReasoning

128K • $0.55 / $2.19 per 1M

Llama 4 Scout