Skip to main content

Mission Control · API bay · §4.5

Batch API Planner

Anthropic's Batch API gives 50% off both input and output tokens in exchange for a 24-hour processing window. Describe the workload and we'll tell you whether the savings justify the wait — and whether to move it whole or split it.

1. Workload shape

2. Caching (optional)

Input tokens served from prompt cache (billed at 10%).

Tokens written into fresh cache (billed at 125%).

Checks the 256 MB / batch payload cap.

3. Latency tolerance

4. Verdict

Move to batch$112.50/mo savings(50% off)

Saves $112.50/mo (50%) at this volume — worth the 24h wait.

Real-time / mo

$225.00

Batch / mo

$112.50

Per call (real)

0.45¢

Per call (batch)

0.23¢

  • At 50,000 calls/mo on Claude Haiku 4.5, real-time costs $225.00/mo. Batch at 50% off would cost $112.50/mo.
  • Switching the workload to the Batch API saves $112.50/mo (50%).

5. Batch structure

At 50,000 calls/mo, you'll need 1 batchper month (cap = 100,000 requests & 256 MB per batch).

POST to /v1/messages/batches, poll status, then download the JSONL results when ready. Typical small-batch turnaround is <1h (the 24h is the ceiling, not the floor).

When the Batch API actually wins

Async jobs win, chat loses. Evals, nightly summarization, bulk classification, data labeling, backfills — perfect. Any user-facing chat surface — never.

Caching stacks on top. 50% off batch is applied AFTER cache discounts. Heavily-cached prompts see less absolute savings (you're already cheap), but the percent discount still applies.

Volume threshold. Below ~$5/mo of real-time spend the ops cost of writing a batch handler usually exceeds the savings. Above ~$50/mo you're leaving real money on the table.

Most batches finish in <1h. The 24h SLA is the ceiling. Anthropic typically clears small batches in minutes — but never design as if that's guaranteed.