Loading lesson…
AI coding bills surprise teams that don't watch them. Let's break down the real cost drivers, the levers that actually reduce them, and how to set guardrails before your CFO does.
A single engineer using Claude Code heavily can generate hundreds of millions of tokens a month. Across a team, the numbers become genuinely expensive. Most overspend comes from habits, not essential usage — which means most of it is recoverable.
| Model tier | Input $/MTok | Output $/MTok |
|---|---|---|
| Flagship (Claude Opus, GPT-5.5, Gemini Ultra) | $15 | $75 |
| Mid (Claude Sonnet, GPT-5, Gemini Pro) | $3 | $15 |
| Small (Claude Haiku, GPT-5 mini, Gemini Flash) | $0.25 | $1.25 |
| Open-weights self-hosted (Llama 4, Qwen 3) | ~$0 (hardware only) | ~$0 (hardware only) |
// Route simple tasks to cheap models, hard tasks to flagships.
async function routeTask(task: Task) {
const complexity = await classifyComplexity(task);
// classifyComplexity uses a small/cheap model
if (complexity === 'trivial') {
return callModel('haiku', task); // $0.25/M
}
if (complexity === 'moderate') {
return callModel('sonnet', task); // $3/M
}
return callModel('opus', task); // $15/M
}
// Rule of thumb: at scale, 70-80% of tasks route to cheap tiers.
// Average cost drops 3-5x with no measurable quality loss.A router in front of your agent is the single highest-leverage optimization. Gateways like Vercel AI Gateway and LiteLLM do this with config.At sustained high volume, self-hosting an open-weights model like Llama 4 or Qwen 3 can undercut API pricing. The crossover happens around $10-20k/month of API spend, higher if you need flagship quality. Below that, running GPUs is a distraction.
The first rule of AI cost optimization: the bill you can't see is the bill you can't control.
— A FinOps lead
The big idea: AI coding bills scale with habits, not just headcount. Caching, routing, context hygiene, and budget visibility get you 80% of the savings. Skip those and you're burning money the market is teaching other teams to keep.
15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-coding-rate-limiting-costs-optimization-creators
What is the core idea behind "Rate-Limiting, Costs, and Optimization"?
Which term best describes a foundational idea in "Rate-Limiting, Costs, and Optimization"?
A learner studying Rate-Limiting, Costs, and Optimization would need to understand which concept?
Which of these is directly relevant to Rate-Limiting, Costs, and Optimization?
Which of the following is a key point about Rate-Limiting, Costs, and Optimization?
Which of these does NOT belong in a discussion of Rate-Limiting, Costs, and Optimization?
Which statement is accurate regarding Rate-Limiting, Costs, and Optimization?
Which of these does NOT belong in a discussion of Rate-Limiting, Costs, and Optimization?
What is the key insight about "These numbers move monthly" in the context of Rate-Limiting, Costs, and Optimization?
What is the key insight about "Agentic loops are the #1 cost surprise" in the context of Rate-Limiting, Costs, and Optimization?
Which statement accurately describes an aspect of Rate-Limiting, Costs, and Optimization?
What does working with Rate-Limiting, Costs, and Optimization typically involve?
Which of the following is true about Rate-Limiting, Costs, and Optimization?
Which best describes the scope of "Rate-Limiting, Costs, and Optimization"?
Which section heading best belongs in a lesson about Rate-Limiting, Costs, and Optimization?