Lesson 54 of 2116
Production Agent Patterns: Queues, Retries, Idempotency
A prototype agent and a production agent have the same LLM. What's different is everything around it — durable state, retries, idempotency, observability. The real engineering.
Lesson map
What this lesson covers
Learning path
The main moves in order
- 1Why production is different
- 2durability
- 3idempotency
- 4retries
Concept cluster
Terms to connect while reading
Section 1
Why production is different
In a prototype, a crash is fine — you rerun. In production, a crash means a user's pizza never got ordered and a $4 LLM call got burned. Production agents must be durable, idempotent, observable, and cost-capped. Most teams discover this after shipping a demo.
The five production requirements
Compare the options
| Requirement | What it means |
|---|---|
| Durable state | Every step is persisted. Process can die and resume. |
| Idempotent steps | Re-running a step is safe — no duplicate actions. |
| Retries with backoff | Transient failures retry; permanent failures surface. |
| Observability | Every tool call, every prompt, every token logged. |
| Cost + step caps | Hard ceilings prevent runaway loops and bills. |
Durable state pattern
Agents that run longer than a few seconds shouldn't live in memory. Checkpoint after every step. Options in 2026:
- Vercel Workflow DevKit (WDK) — step-based, crash-safe, powered by Queues.
- LangGraph + PostgresSaver — durable state machines.
- Temporal — mature workflow engine; strong for multi-day flows.
- Inngest — event-driven steps with retries and concurrency controls.
- Roll your own — Postgres + a state column + a worker loop.
Every step() is durable. If the process dies, execution resumes from the last completed step. Built-in retries + timeouts.
// Vercel Workflow DevKit — modern "use workflow" directive
// Models are addressed via the AI Gateway alias format.
import { step } from 'workflow';
import { generateText } from 'ai';
export async function researchAgent(goal: string) {
'use workflow';
const plan = await step('plan', async () => {
const { text } = await generateText({
model: 'anthropic/claude-opus-4.7',
prompt: `Break into sub-questions:\n${goal}`,
});
return text.split('\n');
});
const findings = [];
for (const q of plan) {
const answer = await step(`research:${q.slice(0, 20)}`, async () => {
return await searchAndSummarize(q);
}, { retries: 3, timeout: '60s' });
findings.push({ q, answer });
}
return await step('synthesize', async () => {
const { text } = await generateText({
model: 'anthropic/claude-opus-4.7',
prompt: `Write a cited answer:\n${JSON.stringify(findings)}`,
});
return text;
});
}Idempotency — the underrated superpower
Any step that touches the outside world (send email, charge card, create ticket) needs an idempotency key. When the step retries, the external system recognizes the key and doesn't duplicate the action.
Every external API call gets an idempotency key derived from workflow state. Most APIs (Stripe, SendGrid, Twilio) support this natively.
// Idempotent Stripe charge
const chargeId = `task:${taskId}:step:${stepName}`;
const charge = await stripe.paymentIntents.create(
{ amount: 5000, currency: 'usd', customer: custId },
{ idempotencyKey: chargeId }
);
// Same chargeId on retry returns the same charge — no double-billing.Retry policy
Compare the options
| Error type | Policy |
|---|---|
| Network timeout | Retry 3x with exponential backoff (1s, 5s, 30s). |
| Rate limit (429) | Retry after Retry-After header; circuit-break after 5 attempts. |
| 5xx server error | Retry 3x; alert on repeated 503. |
| Tool schema mismatch | One retry with error fed back to model. |
| 4xx client error | Do NOT retry — it'll fail the same way. |
| Auth failure | Do NOT retry — alert, stop. |
Observability essentials
- Trace every LLM call with prompt, response, token counts, cost — OpenTelemetry + LangSmith/Braintrust/Vercel Observability.
- Log every tool call with args, result, latency.
- Record every workflow event (start, step complete, retry, fail).
- Cost dashboards per workflow/agent/user.
- Alerts on cost spikes, error spikes, latency regressions.
Cost and step caps
Non-negotiable ceilings. Cheaper to fail a task than to let a loop burn $400 of Opus calls at 3 AM.
const MAX_STEPS = 50;
const MAX_COST_USD = 2.00;
let stepCount = 0;
let costUsd = 0;
while (!isDone(state)) {
if (++stepCount > MAX_STEPS) {
throw new Error('Step cap exceeded — possible loop.');
}
const { result, usage } = await runOneStep(state);
costUsd += usage.inputTokens * 3/1_000_000 + usage.outputTokens * 15/1_000_000;
if (costUsd > MAX_COST_USD) {
throw new Error(`Cost cap exceeded: $${costUsd.toFixed(2)}`);
}
state = applyResult(state, result);
}Next: the security dimension. An agent is a new attack surface.
Key terms in this lesson
End-of-lesson quiz
Check what stuck
15 questions · Score saves to your progress.
Tutor
Curious about “Production Agent Patterns: Queues, Retries, Idempotency”?
Ask anything about this lesson. I’ll answer using just what you’re reading — short, friendly, grounded.
Progress saved locally in this browser. Sign in to sync across devices.
Related lessons
Keep going
Creators · 75 min
Capstone: Build and Ship a Real Agent
Everything comes together. Design, code, test, secure, and ship a production-quality agent with open-source code you can fork today.
Creators · 48 min
Computer Use API: Letting AI Click Through GUIs
Computer Use lets Claude see your screen and use it — mouse, keyboard, apps. The capability is real, the gotchas are real. A hands-on look at what works in 2026.
Creators · 55 min
Building with LangGraph
LangGraph became the production favorite in 2026 for good reasons — explicit state, checkpointing, first-class MCP. Build a real agent end-to-end and learn why.
