How to beginner · 3 min read

How to use OpenAI batch API

Quick answer
Use the OpenAI batch API by passing a list of message arrays to client.chat.completions.create or by sending multiple requests concurrently. The official OpenAI Python SDK supports batching by sending multiple messages in one call or by parallelizing calls asynchronously.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup

Install the official OpenAI Python SDK and set your API key as an environment variable.

  • Run pip install openai to install the SDK.
  • Set your API key in your environment: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows).
bash
pip install openai

Step by step

Use the OpenAI Python SDK v1 to send multiple chat completion requests in a batch by passing a list of message arrays to client.chat.completions.create. This example sends two prompts in one batch call.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

batch_messages = [
    [{"role": "user", "content": "Hello, who won the 2024 Olympics?"}],
    [{"role": "user", "content": "Summarize the plot of The Matrix."}]
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=batch_messages
)

for i, choice in enumerate(response.choices):
    print(f"Response {i+1}:", choice.message.content)
output
Response 1: The 2024 Olympics were held in Paris, France. [summary of winners...]
Response 2: The Matrix is a sci-fi film about a hacker who discovers reality is a simulation controlled by machines...

Common variations

You can also send batch requests asynchronously using asyncio to parallelize calls for better throughput. Alternatively, send multiple requests sequentially if batching is not supported for your use case. Change the model parameter to use other OpenAI models like gpt-4o-mini or gpt-4o.

python
import os
import asyncio
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def fetch_response(prompt):
    response = await client.chat.completions.acreate(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

async def main():
    prompts = [
        "Explain quantum computing in simple terms.",
        "What are the benefits of renewable energy?"
    ]
    tasks = [fetch_response(p) for p in prompts]
    results = await asyncio.gather(*tasks)
    for i, res in enumerate(results):
        print(f"Async response {i+1}:", res)

asyncio.run(main())
output
Async response 1: Quantum computing uses quantum bits to perform complex calculations faster than classical computers...
Async response 2: Renewable energy reduces carbon emissions, lowers energy costs, and promotes sustainability...

Troubleshooting

  • If you get a 400 Bad Request, verify your batch messages are a list of message arrays, not a single array.
  • If you see rate limit errors, reduce batch size or add retry logic with exponential backoff.
  • Ensure your API key is correctly set in os.environ["OPENAI_API_KEY"].

Key Takeaways

  • Use the OpenAI Python SDK v1 and pass a list of message arrays to batch multiple chat completions in one call.
  • Async calls with asyncio can parallelize requests for higher throughput when batching is not supported.
  • Always set your API key securely via environment variables and handle rate limits with retries.
Verified 2026-04 · gpt-4o, gpt-4o-mini, gpt-4-turbo
Verify ↗