AI Batch Cost Estimation: Forecasting GPT-5, Claude, and Gemini Token Spend Before You Run
Pre-flight cost estimation for AI batch jobs across GPT-5, Claude, and Gemini. How to forecast input and output tokens, factor in batch pricing discounts, and avoid surprise bills on a 100,000-row run.
Most teams discover token costs the same way. A finance ping after the first big batch lands. "Why was the AI bill four hundred dollars last month?" The honest answer is usually "we didn't really estimate it." It's the easiest thing to do up front, and the hardest one to explain after the fact.
So here's how to forecast a batch before you actually submit it.
Token math, the short version
Three numbers determine your batch cost. Input tokens, output tokens, and the per-token rate of whichever model you picked. Multiply them, sum them up, you're done.
The catch is that none of those three numbers are obvious before you run. Input tokens depend on your prompt template plus your CSV. Output tokens depend on what you ask the model to produce. The per-token rate depends on the model and whether you're using batch pricing or not.
Estimating each one within 20% is enough to avoid surprises. More precision than that doesn't really change any decisions you'd make anyway.
Estimating input tokens from your CSV
Input tokens per row are roughly: tokens in your prompt template (which is constant) plus tokens in the variable fields you're filling in (which varies row to row).
For a quick estimate, count the total characters in your prompt template and divide by 4. That's a reasonable token count for English. Then take a sample of 10 to 20 representative rows from your CSV, sum the characters in the variable fields, divide by 4, and average it. Multiply by your row count.
For a 5,000-row batch with a 200-token prompt and an average 150-token variable payload, you're looking at 1.75 million input tokens. Knowing that number beats not knowing it.
Estimating output tokens from your prompt
Output tokens are bounded by what you ask for. If your prompt says "produce a 200-word product description," your output is roughly 250 to 300 tokens (200 words is around 260 tokens). If you didn't specify a length at all, the model picks one. Almost always longer than you wanted.
Always specify an output length somewhere in the prompt. It bounds your costs and it makes them predictable. "Concise" is not a length. "120 to 160 characters" is.
Then multiply the per-row output token count by your row count. For 5,000 rows at 280 output tokens each, that's about 1.4 million output tokens.
The batch discount changes your decisions
OpenAI, Anthropic, and Google all offer batch APIs at roughly 50% of the standard per-token rate. The trade is asynchronous processing of up to 24 hours.
For interactive workloads, that's a bad trade. Your users won't wait. For literally everything else, it's the obvious choice. A $40 real-time batch becomes a $20 batch run, and the only difference is you got the results overnight instead of in eight minutes.
The decision rule's pretty simple: if anyone reading the output is human, and your timeline is "this week" rather than "right now," use the batch API. Half-price is half-price.
Sample budgets for 1K, 10K, 100K rows
Concrete examples, assuming a typical 200-input-token, 280-output-token job at mid-tier model batch pricing:
- 1,000 rows. Around $1 to $3, depending on the model. Don't bother estimating, just run it.
- 10,000 rows. Around $10 to $30. Worth a finance heads-up but not really a discussion. Estimate before submitting, but don't lose sleep.
- 100,000 rows. Around $100 to $300. Worth a finance approval and a small test batch first. Estimate twice, run once.
Actual numbers shift with model choice (high-end models cost five to ten times mid-tier) and with prompt length. The point isn't the exact dollar figure. The point is that you can estimate within an order of magnitude in five minutes, and once you've done it once, you'll do it every time.
Pre-flight checklist before you submit
Before you click submit on anything over 1,000 rows, walk through this:
- Estimate input tokens using the prompt-plus-variable-fields method above.
- Cap output tokens in the prompt itself. "200-word description." "60-character meta title."
- Check the batch discount actually applies. Real-time pricing on a 100,000-row job is an expensive way to learn this lesson.
- Run a 50-row test first. Validates the prompt, gives you a real per-row cost number, and surfaces any problems before you commit to the full run.
- Multiply by your row count. If the answer's bigger than your monthly budget, pick a cheaper model or reduce the scope of what you're running.
Twenty minutes of estimation. Your team will thank you, especially the person who has to explain the bill at the end of the month.
Ready to put this into practice?
Try PromptBatch free — process your first batch in minutes, no credit card needed.
Get started for free.png)