API Advanced medium · 7 min

Submitting a batch

What you will learn

Submit multiple API requests in a single batch job to process asynchronously at discounted rates.

Why this matters

Batch processing reduces API costs by 50% and is essential for non-real-time workloads like data labeling, embedding entire datasets, or fine-tuning preparation. Understanding batch submission prevents accidentally paying regular rates for bulk operations.

Skip if: Use standard API calls directly when you need responses in under a minute, when you have fewer than 10 requests, or when latency-sensitive user interactions depend on the response. Batches are designed for async workflows, not synchronous request-response patterns.

Explanation

The Batch API accepts JSONL-formatted requests bundled into a single file uploaded to OpenAI's infrastructure. Instead of making N individual API calls, you submit one batch job that processes requests asynchronously, typically completing within 24 hours. The API charges 50% of the standard rate for batch-processed requests, making it ideal for large-scale embeddings, classifications, or completions.

Under the hood, batches are queued by priority level and processed in groups. Each request in your JSONL file must be valid and self-contained: the API does not halt on errors, instead returning error responses inline with successful results. When you submit a batch using client.batches.create(), you receive a batch object with an id field; you then poll that ID to check status (pending → in_progress → completed) or set up webhooks for completion callbacks.

Use batches for: embedding 100k product descriptions, bulk content moderation, generating training data, or any operation where you can tolerate 10-minute to 24-hour latency. Do not use for real-time chat, per-user API calls in web applications, or anything where your user is waiting for a response.

Request code

python

import json
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ.get('OPENAI_API_KEY'))

requests = [
    {
        "custom_id": "request-1",
        "method": "POST",
        "url": "/v1/chat/completions",
        "body": {
            "model": "gpt-4.1",
            "messages": [{"role": "user", "content": "Classify this: The product is amazing!"}],
            "max_tokens": 50
        }
    },
    {
        "custom_id": "request-2",
        "method": "POST",
        "url": "/v1/chat/completions",
        "body": {
            "model": "gpt-4.1",
            "messages": [{"role": "user", "content": "Classify this: The product is terrible."}],
            "max_tokens": 50
        }
    }
]

with open('batch_requests.jsonl', 'w') as f:
    for request in requests:
        f.write(json.dumps(request) + '\n')

with open('batch_requests.jsonl', 'rb') as f:
    batch_file = client.files.create(
        file=f,
        purpose='batch'
    )

batch = client.batches.create(
    input_file_id=batch_file.id,
    endpoint='/v1/chat/completions',
    completion_window='24h'
)

print(f"Batch ID: {batch.id}")
print(f"Status: {batch.status}")
print(f"Request counts: {batch.request_counts}")

Authentication

Export your OpenAI API key before running code: export OPENAI_API_KEY='sk-...'. The SDK reads this at client instantiation. Alternatively, pass it directly: OpenAI(api_key='sk-...').

Response shape

Field	Description
`id`	batch_xyz123
`object`	batch
`endpoint`	/v1/chat/completions
`errors`
`input_file_id`	file_abc456
`completion_window`	24h
`status`	queued\|in_progress\|completed\|failed\|expired
`output_file_id`	file_def789
`error_file_id`
`created_at`	1713120000
`in_progress_at`
`expires_at`	1713206400
`finalizing_at`
`completed_at`
`failed_at`
`expired_at`
`request_counts`	[object Object]

Field guide

id

Your batch identifier: use this to poll status or retrieve results. Save this immediately after submission.

status

Current state: queued (waiting), in_progress (processing), or completed (done). Poll this every 30 seconds or use webhooks.

output_file_id

The file ID containing all responses. Null until batch completes. Download with client.files.content(output_file_id).

request_counts

The hidden gem: shows completed vs failed counts without waiting for full completion. Polls every 10 seconds to check progress.

completion_window

Must be '24h' for most operations. Older '1h' window no longer supported: use 24h.

expires_at

Unix timestamp when batch becomes inaccessible. Plan your retrieval before this time.

Setup trap

The most common error: uploading raw requests as a file without the 'batch' purpose. You must set purpose='batch' when uploading with client.files.create(), or the API will reject it. Additionally, the JSONL file must be newline-delimited JSON (one request per line), not an array: JSON arrays will silently fail to parse.

Cost

Batch requests cost 50% of standard pricing. A batch of 1,000 gpt-4.1 chat completions (8k input tokens each) would cost approximately $0.30 USD instead of $0.60 USD. For large-scale operations, this is significant: 10M tokens via batch = $1.50 instead of $3.00.

Rate limits

Batches have separate rate limits from real-time APIs. You can submit one batch every 5 minutes per organization, and total batch throughput is limited to 2M tokens per minute. If you hit the submission limit, retry after 5 minutes. Monitor via the API with batch.request_counts to detect processing failures early.

Common gotcha

Developers submit a batch, immediately try to retrieve the output_file_id from the response, and panic when it's null. The output file only exists after the batch reaches 'completed' status: you must poll the batch ID until status is 'completed' before calling client.files.content().

Error recovery

BadRequestError: 'The status of input file is not available'

The file upload failed or the file was deleted. Re-upload the JSONL file and submit a new batch.

NotFoundError: Batch not found

The batch ID does not exist or has expired. Batches expire 7 days after completion. Save the batch ID and retrieve results within this window.

RateLimitError

You are submitting batches too frequently or have hit throughput limits. Wait 5 minutes and retry. Stagger batch submissions if you are submitting multiple batches.

InvalidRequestError: 'endpoint must be /v1/chat/completions'

You passed a malformed endpoint or unsupported endpoint. Use exactly '/v1/chat/completions' or '/v1/embeddings' or '/v1/completions'. Check for typos.

Experienced dev note

Batch processing is not just a cost optimization: it is an architectural primitive for data pipelines. Structure your requests with meaningful custom_ids so you can map responses back to source data. When you retrieve the output file, it is also JSONL format with the same custom_id: this makes joining results back to your database trivial. Also, do not fire-and-forget: implement polling or webhooks with exponential backoff starting at 30 seconds. Batches occasionally hang in 'in_progress' for hours; implement a 48-hour timeout and alert on jobs older than expected.

Check your understanding

You submit a batch at 2 PM with 50,000 requests and polling shows 49,999 completed. At what point can you safely download the output file, and what will happen if one request fails?

Show answer hint

Batches only transition to 'completed' status when all requests have a result (success or error). You cannot partially download results. Failed requests appear in the output file alongside successful ones with error details: the batch as a whole still completes. You must poll until status is 'completed', then download the entire output_file_id, which will contain both successes and errors as separate JSONL lines.

VERSION Batch API is available in openai>=1.3.0. The 24h completion_window is the only supported window as of April 2026; the deprecated 1h window was removed in v1.4.0. Endpoint string must match exactly: '/v1/chat/completions' not '/chat/completions'.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.