Loading lesson…
Long system prompts are expensive. Prompt caching lets you reuse the prefix at up to 90% cost reduction and much lower latency. Here's how to architect prompts for caching.
In production AI apps, the same long preamble — system prompt, few-shot examples, long retrieved documents — is often sent on every request while only a short user message changes. Processing that preamble every time is wasteful. Prompt caching lets the provider store a processed version of the prefix and reuse it.
// Anthropic Messages API { "model": "claude-sonnet-4-5", "system": [ { "type": "text", "text": "<long system prompt, role, rules, policies>", "cache_control": {"type": "ephemeral"} }, { "type": "text", "text": "<knowledge base, 50KB of docs>", "cache_control": {"type": "ephemeral"} } ], "messages": [ {"role": "user", "content": "What's our refund policy for digital goods?"} ] }Marking two prefix chunks as cacheable. Only the final user message varies per request.Caching works best when prefixes are layered in order from most-stable to least-stable. Put static rules first, then slowly-changing context (user profile, session info), then the per-request message last. That way the maximum possible prefix hits the cache.
[STABLE SYSTEM] <- cache forever (or until product change) [KNOWLEDGE BASE] <- cache for the day [USER PROFILE] <- cache for the session [CONVERSATION HISTORY]<- cache up to last turn [CURRENT USER MSG] <- never cacheLayered cache strategy — deeper layers reused more often.8 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-prompting-caching-creators
What is the main idea of "Prompt Caching and Cost Optimization"?
Which concept is most central to "Prompt Caching and Cost Optimization"?
Which use of AI fits this topic best?
What should a careful learner remember about "The numbers (Anthropic, 2026)"?
You want to use AI after this lesson. What is the safest next step?
How should AI output about prompt caching be treated?
Name one way to verify an AI answer about prompt caching.
Which action would help you apply "Prompt Caching and Cost Optimization" responsibly?