Loading lesson…
Haiku is Anthropic's cheap, fast tier. Here is the math on when it beats Sonnet for production workloads.
Everyone talks about Opus and Sonnet. Haiku 4.5 is the quiet workhorse — approximately $1 in / $5 out per million tokens, sub-second first-token latency, and quality that now rivals what Sonnet 3.5 shipped 18 months ago. For high-volume apps, Haiku is where the margins live.
| Metric | Haiku 4.5 | Sonnet 4.6 |
|---|---|---|
| Input / M tokens | ~$1 | $3 |
| Output / M tokens | ~$5 | $15 |
| Typical p50 latency | <1s | 2-4s |
| Best for | routing, extraction, high QPS | reasoning, long docs, quality chat |
client.messages.create(
model="claude-haiku-4-5",
max_tokens=200,
messages=[{"role": "user", "content": f"Classify: {ticket}"}],
)A routing call that costs a fraction of a cent.15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-modelx-claude-haiku-45-builders
A developer is building a ticket routing system that must handle 10,000 requests per minute. Which model would be most cost-effective for the initial classification step?
What does the lesson mean when it says Haiku is 'the quiet workhorse'?
A student asks what 'latency' means in the context of AI models. Which definition is correct?
What does QPS stand for in the lesson's comparison table?
A developer implements a system where Haiku handles initial document parsing, and if confidence is low, it escalates to Sonnet. What is this architectural pattern called?
A product manager wants to reduce API costs by 80% while maintaining quality on complex queries. Which approach does the lesson recommend?
Which task is the lesson LEAST likely to recommend Haiku for?
A company processes 1 million customer messages per day. Why might Haiku help their bottom line more than Sonnet?
The lesson mentions 'structured extraction from semi-clean docs.' What type of document would be most suitable for Haiku?
What is the primary reason to use Haiku for 'tool-call decisions in multi-step agents'?
Based on the lesson, if a developer's priority is minimum time-to-first-token, which model should they choose?
What does the lesson imply about using Haiku for 'autocomplete-style suggestions'?
A startup is building their first AI product with a limited budget. Why might the lesson suggest starting with Haiku?
The lesson states Haiku can handle 80% of work while cutting costs. What happens to the remaining 20% of requests?
Why might a company choose NOT to use Haiku for long document summarization?