Tendril — AI Lessons for Real Life

Tendril

The premise

Anthropic and OpenAI offer prompt caching with up to 90% discounts on cached tokens — huge for chat with long system prompts.

What AI does well here

Reuse cached system prompts within a 5-minute window.

Cut latency on subsequent calls with cached prefixes.

Reduce cost on RAG with stable retrieved chunks.

Stack with batch APIs for compounding savings.

What AI cannot do

Cache content that changes per request.

Persist cache beyond provider-defined TTL (often 5 min).

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-tools-ai-prompt-caching-r13a2-creators

What is the primary financial advantage of using prompt caching with providers like Anthropic and OpenAI?

Cached content is processed faster, reducing compute fees
Cached tokens receive up to a 90% discount compared to regular token pricing
Caching reduces the total number of tokens sent in each request
Cached prompts eliminate all API costs for repeated requests

Within what time frame does the cached version of a system prompt remain valid for reuse?

Exactly 1 hour
Approximately 5 minutes
Only during the current API session
Until the user starts a new conversation

Where should you place stable content like system prompts and large reference documents in your API calls to maximize caching benefits?

Scattered throughout the prompt for better organization
At the end of the prompt (after user message)
At the beginning of the prompt (first position)
It doesn't matter where content is placed

What happens to your caching savings if you make even a small edit to your system prompt between requests?

The cache is invalidated and you pay full price for the next request
The system automatically merges the old and new cache
The cache remains active but charges a small editing fee
The edited prompt creates a new cache entry without affecting the old one

How does prompt caching typically affect the latency of subsequent API calls with identical prefixes?

Latency decreases because the cached prefix doesn't need reprocessing
Latency stays the same but costs decrease
Latency increases due to cache lookup overhead
Latency becomes variable and unpredictable

Which type of content is most effective to cache for cost savings?

User messages that vary with each query
Stable system instructions and large documents that don't change between requests
Real-time data feeds
Temporary session variables

In a RAG (Retrieval-Augmented Generation) setup, what type of content benefits most from prompt caching?

The user's unique question in each request
Temporary authentication tokens
Stable retrieved chunks that are consistent across multiple queries
Dynamic search results that change each time

What does it mean that cache invalidates if any 'prefix byte changes'?

Only changes to the last byte of the prompt affect caching
Changing even one byte at the start of your prompt breaks the cache entirely
The cache system rounds up byte counts to the nearest kilobyte
Unicode characters invalidate the cache but ASCII does not

Which combination of techniques would likely yield the greatest cost savings?

Prompt caching combined with batch processing APIs
Requesting real-time data updates with each call
Using multiple different system prompts
Increasing the number of tokens in each request

Why is versioning your system prompts important when using prompt caching?

Versioning automatically optimizes cache efficiency
Versioning has no impact on caching behavior
Different versions create separate cache entries, allowing you to roll back without losing savings
You must version prompts because caching requires unique identifiers

What would be the worst approach for maximizing prompt caching savings?

Using the same cache breakpoint markers consistently
Keeping system prompts consistent across requests
Placing user questions before system prompts
Caching large reference documents

What is the relationship between cache breakpoints and provider documentation?

Cache breakpoints are automatically detected by all AI providers
There is no such thing as cache breakpoints in prompt caching
Cache breakpoints must be explicitly marked according to each provider's documentation
Cache breakpoints are only relevant for free-tier users

What occurs when the provider-defined TTL (time-to-live) for a cached prompt expires?

The cached content must be reprocessed on the next request, at full price
The user receives a notification that caching has expired
The system deletes the prompt and sends an error notification
The cache automatically moves to long-term storage

A developer sends the same system prompt followed by a different user question every minute for 10 minutes. How many times will the system prompt portion be charged at full price?

Only the first time
Once at the beginning and once at the end
Never, because questions are different
Every time because questions differ

What does the '90% discount' specifically apply to in prompt caching?

The number of requests you can make
The total API bill including non-cached tokens
Only the cached tokens themselves, not the entire request cost
The base subscription fee

The premise

Anthropic and OpenAI offer prompt caching with up to 90% discounts on cached tokens — huge for chat with long system prompts.

What AI does well here

Reuse cached system prompts within a 5-minute window.

Cut latency on subsequent calls with cached prefixes.

Reduce cost on RAG with stable retrieved chunks.

Stack with batch APIs for compounding savings.

What AI cannot do

Cache content that changes per request.

Persist cache beyond provider-defined TTL (often 5 min).

End-of-lesson check

15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-tools-ai-prompt-caching-r13a2-creators

What is the primary financial advantage of using prompt caching with providers like Anthropic and OpenAI?

Cached content is processed faster, reducing compute fees
Cached tokens receive up to a 90% discount compared to regular token pricing
Caching reduces the total number of tokens sent in each request
Cached prompts eliminate all API costs for repeated requests

Within what time frame does the cached version of a system prompt remain valid for reuse?

Exactly 1 hour
Approximately 5 minutes
Only during the current API session
Until the user starts a new conversation

Where should you place stable content like system prompts and large reference documents in your API calls to maximize caching benefits?

Scattered throughout the prompt for better organization
At the end of the prompt (after user message)
At the beginning of the prompt (first position)
It doesn't matter where content is placed

What happens to your caching savings if you make even a small edit to your system prompt between requests?

The cache is invalidated and you pay full price for the next request
The system automatically merges the old and new cache
The cache remains active but charges a small editing fee
The edited prompt creates a new cache entry without affecting the old one

How does prompt caching typically affect the latency of subsequent API calls with identical prefixes?

Latency decreases because the cached prefix doesn't need reprocessing
Latency stays the same but costs decrease
Latency increases due to cache lookup overhead
Latency becomes variable and unpredictable

Which type of content is most effective to cache for cost savings?

User messages that vary with each query
Stable system instructions and large documents that don't change between requests
Real-time data feeds
Temporary session variables

In a RAG (Retrieval-Augmented Generation) setup, what type of content benefits most from prompt caching?

The user's unique question in each request
Temporary authentication tokens
Stable retrieved chunks that are consistent across multiple queries
Dynamic search results that change each time

What does it mean that cache invalidates if any 'prefix byte changes'?

Only changes to the last byte of the prompt affect caching
Changing even one byte at the start of your prompt breaks the cache entirely
The cache system rounds up byte counts to the nearest kilobyte
Unicode characters invalidate the cache but ASCII does not

Which combination of techniques would likely yield the greatest cost savings?

Prompt caching combined with batch processing APIs
Requesting real-time data updates with each call
Using multiple different system prompts
Increasing the number of tokens in each request

Why is versioning your system prompts important when using prompt caching?

Versioning automatically optimizes cache efficiency
Versioning has no impact on caching behavior
Different versions create separate cache entries, allowing you to roll back without losing savings
You must version prompts because caching requires unique identifiers

What would be the worst approach for maximizing prompt caching savings?

Using the same cache breakpoint markers consistently
Keeping system prompts consistent across requests
Placing user questions before system prompts
Caching large reference documents

What is the relationship between cache breakpoints and provider documentation?

Cache breakpoints are automatically detected by all AI providers
There is no such thing as cache breakpoints in prompt caching
Cache breakpoints must be explicitly marked according to each provider's documentation
Cache breakpoints are only relevant for free-tier users

What occurs when the provider-defined TTL (time-to-live) for a cached prompt expires?

The cached content must be reprocessed on the next request, at full price
The user receives a notification that caching has expired
The system deletes the prompt and sends an error notification
The cache automatically moves to long-term storage

A developer sends the same system prompt followed by a different user question every minute for 10 minutes. How many times will the system prompt portion be charged at full price?

Only the first time
Once at the beginning and once at the end
Never, because questions are different
Every time because questions differ

What does the '90% discount' specifically apply to in prompt caching?

The number of requests you can make
The total API bill including non-cached tokens
Only the cached tokens themselves, not the entire request cost
The base subscription fee

AI Prompt Caching: 90% Discount on Repeated Context

The premise

What AI does well here

What AI cannot do

End-of-lesson check

AI Prompt Caching: 90% Discount on Repeated Context

The premise

What AI does well here

What AI cannot do

End-of-lesson check