§1 · Fundamentals

Token economics in one minute

The four numbers that decide what a Claude call costs.

Every call to Claude is priced on four numbers. Get these into muscle memory and the rest of the cheat sheet falls into place.

The four numbers

Input tokens — everything you sent: system prompt, history, the user message, tool definitions. Charged at the model's input rate.
Output tokens — everything Claude wrote back. Charged at the output rate, which is usually 5× the input rate.
Cache write tokens — when prompt caching is on, the first call charges a one-time cache-write fee, typically 1.25× the input rate for the cached span.
Cache read tokens — every subsequent call hitting that prefix reads at roughly 0.10× the input rate. This is where the savings live.

The rule of thumb you'll use every day

> Output costs ~5× input. Cached reads cost ~0.10× input. Optimize output length and cacheable prefixes before anything else.

A worked example

Sending 4,000 input tokens + 800 output tokens to Sonnet costs about 2.4¢ per call. Cache the 3,500-token stable prefix and the second call drops to about 0.9¢ — a 62% saving that compounds every subsequent call.

That same call routed to Haiku (often a fine choice for classification, extraction, formatting) costs about 0.7¢ instead. Same prompt, 70% less spend.

What this tells you

Cache the prefix. Anything that doesn't change between calls — instructions, schemas, examples, large documents — belongs in the cached prefix.
Right-size the model. Reach for Haiku first; reserve Sonnet for real reasoning; reach for Opus only when Sonnet has measurably underperformed.
Cap max_tokens. A generous default ceiling exposes you to long-tail completions billed at full output rate.

The rest of the cheat sheet is just these three rules applied to specific situations.