Mission Control · API bay · §4.5
Batch API Planner
Anthropic's Batch API gives 50% off both input and output tokens in exchange for a 24-hour processing window. Describe the workload and we'll tell you whether the savings justify the wait — and whether to move it whole or split it.
1. Workload shape
2. Caching (optional)
Input tokens served from prompt cache (billed at 10%).
Tokens written into fresh cache (billed at 125%).
Checks the 256 MB / batch payload cap.
3. Latency tolerance
4. Verdict
Saves $112.50/mo (50%) at this volume — worth the 24h wait.
Real-time / mo
$225.00
Batch / mo
$112.50
Per call (real)
0.45¢
Per call (batch)
0.23¢
- ›At 50,000 calls/mo on Claude Haiku 4.5, real-time costs $225.00/mo. Batch at 50% off would cost $112.50/mo.
- ›Switching the workload to the Batch API saves $112.50/mo (50%).
5. Batch structure
At 50,000 calls/mo, you'll need 1 batchper month (cap = 100,000 requests & 256 MB per batch).
POST to /v1/messages/batches, poll status, then download the JSONL results when ready. Typical small-batch turnaround is <1h (the 24h is the ceiling, not the floor).
When the Batch API actually wins
Async jobs win, chat loses. Evals, nightly summarization, bulk classification, data labeling, backfills — perfect. Any user-facing chat surface — never.
Caching stacks on top. 50% off batch is applied AFTER cache discounts. Heavily-cached prompts see less absolute savings (you're already cheap), but the percent discount still applies.
Volume threshold. Below ~$5/mo of real-time spend the ops cost of writing a batch handler usually exceeds the savings. Above ~$50/mo you're leaving real money on the table.
Most batches finish in <1h. The 24h SLA is the ceiling. Anthropic typically clears small batches in minutes — but never design as if that's guaranteed.