How to use OpenAI batch API
Quick answer
Use the OpenAI batch API by passing a list of message arrays to
client.chat.completions.create or by sending multiple requests concurrently. The official OpenAI Python SDK supports batching by sending multiple messages in one call or by parallelizing calls asynchronously.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0
Setup
Install the official OpenAI Python SDK and set your API key as an environment variable.
- Run
pip install openaito install the SDK. - Set your API key in your environment:
export OPENAI_API_KEY='your_api_key'(Linux/macOS) orsetx OPENAI_API_KEY "your_api_key"(Windows).
pip install openai Step by step
Use the OpenAI Python SDK v1 to send multiple chat completion requests in a batch by passing a list of message arrays to client.chat.completions.create. This example sends two prompts in one batch call.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
batch_messages = [
[{"role": "user", "content": "Hello, who won the 2024 Olympics?"}],
[{"role": "user", "content": "Summarize the plot of The Matrix."}]
]
response = client.chat.completions.create(
model="gpt-4o",
messages=batch_messages
)
for i, choice in enumerate(response.choices):
print(f"Response {i+1}:", choice.message.content) output
Response 1: The 2024 Olympics were held in Paris, France. [summary of winners...] Response 2: The Matrix is a sci-fi film about a hacker who discovers reality is a simulation controlled by machines...
Common variations
You can also send batch requests asynchronously using asyncio to parallelize calls for better throughput. Alternatively, send multiple requests sequentially if batching is not supported for your use case. Change the model parameter to use other OpenAI models like gpt-4o-mini or gpt-4o.
import os
import asyncio
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
async def fetch_response(prompt):
response = await client.chat.completions.acreate(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.content
async def main():
prompts = [
"Explain quantum computing in simple terms.",
"What are the benefits of renewable energy?"
]
tasks = [fetch_response(p) for p in prompts]
results = await asyncio.gather(*tasks)
for i, res in enumerate(results):
print(f"Async response {i+1}:", res)
asyncio.run(main()) output
Async response 1: Quantum computing uses quantum bits to perform complex calculations faster than classical computers... Async response 2: Renewable energy reduces carbon emissions, lowers energy costs, and promotes sustainability...
Troubleshooting
- If you get a
400 Bad Request, verify your batch messages are a list of message arrays, not a single array. - If you see rate limit errors, reduce batch size or add retry logic with exponential backoff.
- Ensure your API key is correctly set in
os.environ["OPENAI_API_KEY"].
Key Takeaways
- Use the OpenAI Python SDK v1 and pass a list of message arrays to batch multiple chat completions in one call.
- Async calls with
asynciocan parallelize requests for higher throughput when batching is not supported. - Always set your API key securely via environment variables and handle rate limits with retries.