Loading lesson…
Long system prompts are expensive. Prompt caching lets you reuse the prefix at up to 90% cost reduction and much lower latency. Here's how to architect prompts for caching.
In production AI apps, the same long preamble — system prompt, few-shot examples, long retrieved documents — is often sent on every request while only a short user message changes. Processing that preamble every time is wasteful. Prompt caching lets the provider store a processed version of the prefix and reuse it.
// Anthropic Messages API
{
"model": "claude-sonnet-4-5",
"system": [
{
"type": "text",
"text": "<long system prompt, role, rules, policies...>",
"cache_control": {"type": "ephemeral"}
},
{
"type": "text",
"text": "<knowledge base, 50KB of docs>",
"cache_control": {"type": "ephemeral"}
}
],
"messages": [
{"role": "user", "content": "What's our refund policy for digital goods?"}
]
}Marking two prefix chunks as cacheable. Only the final user message varies per request.Caching works best when prefixes are layered in order from most-stable to least-stable. Put static rules first, then slowly-changing context (user profile, session info), then the per-request message last. That way the maximum possible prefix hits the cache.
[STABLE SYSTEM] <- cache forever (or until product change)
[KNOWLEDGE BASE] <- cache for the day
[USER PROFILE] <- cache for the session
[CONVERSATION HISTORY]<- cache up to last turn
[CURRENT USER MSG] <- never cacheLayered cache strategy — deeper layers reused more often.15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-prompting-caching-creators
What is the core idea behind "Prompt Caching and Cost Optimization"?
Which term best describes a foundational idea in "Prompt Caching and Cost Optimization"?
A learner studying Prompt Caching and Cost Optimization would need to understand which concept?
Which of these is directly relevant to Prompt Caching and Cost Optimization?
Which of the following is a key point about Prompt Caching and Cost Optimization?
Which of these does NOT belong in a discussion of Prompt Caching and Cost Optimization?
Which statement is accurate regarding Prompt Caching and Cost Optimization?
Which of these does NOT belong in a discussion of Prompt Caching and Cost Optimization?
What is the key insight about "The numbers (Anthropic, 2026)" in the context of Prompt Caching and Cost Optimization?
What is the key insight about "Caches have TTLs" in the context of Prompt Caching and Cost Optimization?
What is the recommended tip about "Practitioner tip" in the context of Prompt Caching and Cost Optimization?
Which statement accurately describes an aspect of Prompt Caching and Cost Optimization?
What does working with Prompt Caching and Cost Optimization typically involve?
Which best describes the scope of "Prompt Caching and Cost Optimization"?
Which section heading best belongs in a lesson about Prompt Caching and Cost Optimization?