Loading lesson…
Haiku is Anthropic's cheap, fast tier. Here is the math on when it beats Sonnet for production workloads.
Everyone talks about Opus and Sonnet. Haiku 4.5 is the quiet workhorse — approximately $1 in / $5 out per million tokens, sub-second first-token latency, and quality that now rivals what Sonnet 3.5 shipped 18 months ago. For high-volume apps, Haiku is where the margins live.
| Metric | Haiku 4.5 | Sonnet 4.6 |
|---|---|---|
| Input / M tokens | ~$1 | $3 |
| Output / M tokens | ~$5 | $15 |
| Typical p50 latency | <1s | 2-4s |
| Best for | routing, extraction, high QPS | reasoning, long docs, quality chat |
client.messages.create( model="claude-haiku-4-5", max_tokens=200, messages=[{"role": "user", "content": f"Classify: {ticket}"}], )A routing call that costs a fraction of a cent.8 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-modelx-claude-haiku-45-builders
What is the main idea of "Claude Haiku 4.5 — speed/cost analysis"?
Which concept is most central to "Claude Haiku 4.5 — speed/cost analysis"?
Which use of AI fits this topic best?
What should a careful learner remember about "Cascade pattern"?
You want to use AI after this lesson. What is the safest next step?
How should AI output about Claude Haiku 4.5 be treated?
Name one way to verify an AI answer about Claude Haiku 4.5.
Which action would help you apply "Claude Haiku 4.5 — speed/cost analysis" responsibly?