§4.1 · API
Prompt caching
How to cache the prefix so repeat calls cost a fraction.
Prompt caching is the single biggest lever you have on Anthropic spend if your application sends the same context multiple times. Most API users leave 50–80% of their bill on the table by not using it correctly.
How it works
When you send a prompt with a cache_control breakpoint, Anthropic caches the prefix up to that breakpoint for 5 minutes (refreshed on each hit). The first call pays a small write premium (~1.25× input). Every subsequent call that sends the same prefix pays only the read rate (~0.10× input) for those tokens.
The cache is keyed on the exact prefix. One character of drift and the prefix is "poisoned" — no cache hit.
The three rules
- Stable content first, volatile content last. Put the user's question, the current date, and any per-request variables at the END of your messages. Everything that doesn't change goes at the start.
- One breakpoint, near the end of the stable section. You don't need a breakpoint on every block. One well-placed one captures the bulk of the savings.
- Watch the minimum cacheable size. Sonnet caches at 1,024+ tokens; Haiku at 2,048+. Below those thresholds you pay the cache-write fee with no savings.
A typical mistake
[date: 2026-05-18]
[user query: "..."]
[long stable instructions block]
[cache_control breakpoint]
Nothing here will cache. The volatile date + query come before the stable block, so the stable block is never in the cacheable prefix. Reorder to:
[long stable instructions block]
[cache_control breakpoint]
[date: 2026-05-18]
[user query: "..."]
Now the instructions cache. The date and query are after the breakpoint — they don't matter for the cache key.
What "good" looks like
Production apps that get this right see cache hit rates above 70% on traffic that re-sends a stable system prompt. Below 30% means you have a prefix-ordering bug.