Submitting a batch
Why this matters
Batch processing reduces API costs by 50% and is essential for non-real-time workloads like data labeling, embedding entire datasets, or fine-tuning preparation. Understanding batch submission prevents accidentally paying regular rates for bulk operations.
Explanation
The Batch API accepts JSONL-formatted requests bundled into a single file uploaded to OpenAI's infrastructure. Instead of making N individual API calls, you submit one batch job that processes requests asynchronously, typically completing within 24 hours. The API charges 50% of the standard rate for batch-processed requests, making it ideal for large-scale embeddings, classifications, or completions.
Under the hood, batches are queued by priority level and processed in groups. Each request in your JSONL file must be valid and self-contained: the API does not halt on errors, instead returning error responses inline with successful results. When you submit a batch using client.batches.create(), you receive a batch object with an id field; you then poll that ID to check status (pending → in_progress → completed) or set up webhooks for completion callbacks.
Use batches for: embedding 100k product descriptions, bulk content moderation, generating training data, or any operation where you can tolerate 10-minute to 24-hour latency. Do not use for real-time chat, per-user API calls in web applications, or anything where your user is waiting for a response.
Request code
import json
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ.get('OPENAI_API_KEY'))
requests = [
{
"custom_id": "request-1",
"method": "POST",
"url": "/v1/chat/completions",
"body": {
"model": "gpt-4.1",
"messages": [{"role": "user", "content": "Classify this: The product is amazing!"}],
"max_tokens": 50
}
},
{
"custom_id": "request-2",
"method": "POST",
"url": "/v1/chat/completions",
"body": {
"model": "gpt-4.1",
"messages": [{"role": "user", "content": "Classify this: The product is terrible."}],
"max_tokens": 50
}
}
]
with open('batch_requests.jsonl', 'w') as f:
for request in requests:
f.write(json.dumps(request) + '\n')
with open('batch_requests.jsonl', 'rb') as f:
batch_file = client.files.create(
file=f,
purpose='batch'
)
batch = client.batches.create(
input_file_id=batch_file.id,
endpoint='/v1/chat/completions',
completion_window='24h'
)
print(f"Batch ID: {batch.id}")
print(f"Status: {batch.status}")
print(f"Request counts: {batch.request_counts}") Authentication
Export your OpenAI API key before running code: export OPENAI_API_KEY='sk-...'. The SDK reads this at client instantiation. Alternatively, pass it directly: OpenAI(api_key='sk-...').
Response shape
| Field | Description |
|---|---|
id | batch_xyz123 |
object | batch |
endpoint | /v1/chat/completions |
errors | |
input_file_id | file_abc456 |
completion_window | 24h |
status | queued|in_progress|completed|failed|expired |
output_file_id | file_def789 |
error_file_id | |
created_at | 1713120000 |
in_progress_at | |
expires_at | 1713206400 |
finalizing_at | |
completed_at | |
failed_at | |
expired_at | |
request_counts | [object Object] |
Field guide
id Your batch identifier: use this to poll status or retrieve results. Save this immediately after submission.
status Current state: queued (waiting), in_progress (processing), or completed (done). Poll this every 30 seconds or use webhooks.
output_file_id The file ID containing all responses. Null until batch completes. Download with client.files.content(output_file_id).
request_counts The hidden gem: shows completed vs failed counts without waiting for full completion. Polls every 10 seconds to check progress.
completion_window Must be '24h' for most operations. Older '1h' window no longer supported: use 24h.
expires_at Unix timestamp when batch becomes inaccessible. Plan your retrieval before this time.
Setup trap
The most common error: uploading raw requests as a file without the 'batch' purpose. You must set purpose='batch' when uploading with client.files.create(), or the API will reject it. Additionally, the JSONL file must be newline-delimited JSON (one request per line), not an array: JSON arrays will silently fail to parse.
Cost
Batch requests cost 50% of standard pricing. A batch of 1,000 gpt-4.1 chat completions (8k input tokens each) would cost approximately $0.30 USD instead of $0.60 USD. For large-scale operations, this is significant: 10M tokens via batch = $1.50 instead of $3.00.
Rate limits
Batches have separate rate limits from real-time APIs. You can submit one batch every 5 minutes per organization, and total batch throughput is limited to 2M tokens per minute. If you hit the submission limit, retry after 5 minutes. Monitor via the API with batch.request_counts to detect processing failures early.
Common gotcha
Developers submit a batch, immediately try to retrieve the output_file_id from the response, and panic when it's null. The output file only exists after the batch reaches 'completed' status: you must poll the batch ID until status is 'completed' before calling client.files.content().
Error recovery
BadRequestError: 'The status of input file is not available'NotFoundError: Batch not foundRateLimitErrorInvalidRequestError: 'endpoint must be /v1/chat/completions'Experienced dev note
Batch processing is not just a cost optimization: it is an architectural primitive for data pipelines. Structure your requests with meaningful custom_ids so you can map responses back to source data. When you retrieve the output file, it is also JSONL format with the same custom_id: this makes joining results back to your database trivial. Also, do not fire-and-forget: implement polling or webhooks with exponential backoff starting at 30 seconds. Batches occasionally hang in 'in_progress' for hours; implement a 48-hour timeout and alert on jobs older than expected.
Check your understanding
You submit a batch at 2 PM with 50,000 requests and polling shows 49,999 completed. At what point can you safely download the output file, and what will happen if one request fails?
Show answer hint
Batches only transition to 'completed' status when all requests have a result (success or error). You cannot partially download results. Failed requests appear in the output file alongside successful ones with error details: the batch as a whole still completes. You must poll until status is 'completed', then download the entire output_file_id, which will contain both successes and errors as separate JSONL lines.