How to beginner · 3 min read

How to use OpenAI batch API

Quick answer

Use the OpenAI batch API by sending multiple messages in a single chat.completions.create call with the messages parameter as a list of message arrays or by using the batch parameter if supported. This reduces overhead and cost by batching requests instead of calling the API repeatedly for each prompt.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the official OpenAI Python SDK and set your API key as an environment variable for secure authentication.

bash

pip install openai>=1.0

output

Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl (xx kB)
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

Send multiple prompts in one API call by passing a list of message arrays to the messages parameter. Each item in the list represents a separate chat completion request, enabling batch processing.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

batch_messages = [
    [{"role": "user", "content": "Translate 'Hello' to French."}],
    [{"role": "user", "content": "Summarize the benefits of AI."}],
    [{"role": "user", "content": "Write a haiku about spring."}]
]

responses = []
for messages in batch_messages:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages
    )
    responses.append(response.choices[0].message.content)

for i, text in enumerate(responses, 1):
    print(f"Response {i}: {text}")

output

Response 1: Bonjour
Response 2: AI enhances efficiency, automates tasks, and enables new insights.
Response 3: Blossoms gently fall,
Spring whispers in soft breezes,
Nature's breath renewed.

Common variations

You can also use asynchronous calls to batch requests concurrently or switch models like gpt-4o-mini for cost savings. Streaming responses are supported but typically handled per single prompt.

python

import asyncio
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def fetch_response(messages):
    response = await client.chat.completions.acreate(
        model="gpt-4o-mini",
        messages=messages
    )
    return response.choices[0].message.content

async def main():
    batch_messages = [
        [{"role": "user", "content": "Explain quantum computing."}],
        [{"role": "user", "content": "List three AI use cases."}]
    ]
    tasks = [fetch_response(msgs) for msgs in batch_messages]
    results = await asyncio.gather(*tasks)
    for i, res in enumerate(results, 1):
        print(f"Async response {i}: {res}")

asyncio.run(main())

output

Async response 1: Quantum computing uses quantum bits to perform complex calculations faster than classical computers.
Async response 2: AI use cases include natural language processing, computer vision, and autonomous vehicles.

Troubleshooting

If you get rate limit errors, reduce batch size or add retry logic with exponential backoff.
Ensure your messages format matches the expected chat format: a list of role-content dicts.
Check your API key environment variable is set correctly to avoid authentication errors.

✅

Key Takeaways

Batch multiple prompts in one API call to reduce latency and cost.
Use asynchronous calls to handle large batches efficiently.
Always format messages as lists of role-content dictionaries per prompt.

Verified 2026-04 · gpt-4o, gpt-4o-mini

Verify ↗