§1 · Fundamentals
Token economics in one minute
The four numbers that decide what a Claude call costs.
Every call to Claude is priced on four numbers. Get these into muscle memory and the rest of the cheat sheet falls into place.
The four numbers
- Input tokens — everything you sent: system prompt, history, the user message, tool definitions. Charged at the model's input rate.
- Output tokens — everything Claude wrote back. Charged at the output rate, which is usually 5× the input rate.
- Cache write tokens — when prompt caching is on, the first call charges a one-time cache-write fee, typically 1.25× the input rate for the cached span.
- Cache read tokens — every subsequent call hitting that prefix reads at roughly 0.10× the input rate. This is where the savings live.
The rule of thumb you'll use every day
> Output costs ~5× input. Cached reads cost ~0.10× input. Optimize output length and cacheable prefixes before anything else.
A worked example
Sending 4,000 input tokens + 800 output tokens to Sonnet costs about 2.4¢ per call. Cache the 3,500-token stable prefix and the second call drops to about 0.9¢ — a 62% saving that compounds every subsequent call.
That same call routed to Haiku (often a fine choice for classification, extraction, formatting) costs about 0.7¢ instead. Same prompt, 70% less spend.
What this tells you
- Cache the prefix. Anything that doesn't change between calls — instructions, schemas, examples, large documents — belongs in the cached prefix.
- Right-size the model. Reach for Haiku first; reserve Sonnet for real reasoning; reach for Opus only when Sonnet has measurably underperformed.
- Cap
max_tokens. A generous default ceiling exposes you to long-tail completions billed at full output rate.
The rest of the cheat sheet is just these three rules applied to specific situations.