Long-Context Code Understanding — The 1M-Token Era

Section 1

A Million Tokens Changes the Job

Compare the options

Strategy	Cost per call	Attention quality
Paste full repo every call	Very high	Variable — worst in middle
Paste only relevant files	Moderate	Good — surgical
RAG (retrieve then prompt)	Low	Excellent if retrieval is good
Prompt caching with long context	High first call, low after	Full context with amortized cost

Cache the stable parts of a long-context prompt. Pay full price once, then reuse cheaply for the duration of your session.

python

# Anthropic prompt caching example
response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=2048,
    system=[
        {
            "type": "text",
            "text": "You are a senior engineer reviewing a codebase."
        },
        {
            "type": "text",
            "text": open("whole_repo.txt").read(),  # 800k tokens
            "cache_control": {"type": "ephemeral"}  # CACHE THIS
        }
    ],
    messages=[{
        "role": "user",
        "content": "What's the purpose of utils/parser.ts?"
    }]
)
# First call: full price for the 800k tokens.
# Every subsequent call in the next 5 min: ~10% of that cost.

Key terms in this lesson

Long-Context Code Understanding — The 1M-Token Era

A Million Tokens Changes the Job

What you can now do in one pass

Long context is not free context

Prompt caching is the game-changer

Design patterns that matter

Needle-in-haystack: the eval to know

Curious about “Long-Context Code Understanding — The 1M-Token Era”?

Keep going

Long-Context Code Understanding — The 1M-Token Era

A Million Tokens Changes the Job

What you can now do in one pass

Long context is not free context

Prompt caching is the game-changer

Design patterns that matter

Needle-in-haystack: the eval to know

Curious about “Long-Context Code Understanding — The 1M-Token Era”?

Keep going