How to use OpenAI batch API
Quick answer
Use the OpenAI batch API by sending multiple messages in a single chat.completions.create call with the messages parameter as a list of message arrays or by using the batch parameter if supported. This reduces overhead and cost by batching requests instead of calling the API repeatedly for each prompt.
PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0
Setup
Install the official OpenAI Python SDK and set your API key as an environment variable for secure authentication.
pip install openai>=1.0 output
Collecting openai Downloading openai-1.x.x-py3-none-any.whl (xx kB) Installing collected packages: openai Successfully installed openai-1.x.x
Step by step
Send multiple prompts in one API call by passing a list of message arrays to the messages parameter. Each item in the list represents a separate chat completion request, enabling batch processing.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
batch_messages = [
[{"role": "user", "content": "Translate 'Hello' to French."}],
[{"role": "user", "content": "Summarize the benefits of AI."}],
[{"role": "user", "content": "Write a haiku about spring."}]
]
responses = []
for messages in batch_messages:
response = client.chat.completions.create(
model="gpt-4o",
messages=messages
)
responses.append(response.choices[0].message.content)
for i, text in enumerate(responses, 1):
print(f"Response {i}: {text}") output
Response 1: Bonjour Response 2: AI enhances efficiency, automates tasks, and enables new insights. Response 3: Blossoms gently fall, Spring whispers in soft breezes, Nature's breath renewed.
Common variations
You can also use asynchronous calls to batch requests concurrently or switch models like gpt-4o-mini for cost savings. Streaming responses are supported but typically handled per single prompt.
import asyncio
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
async def fetch_response(messages):
response = await client.chat.completions.acreate(
model="gpt-4o-mini",
messages=messages
)
return response.choices[0].message.content
async def main():
batch_messages = [
[{"role": "user", "content": "Explain quantum computing."}],
[{"role": "user", "content": "List three AI use cases."}]
]
tasks = [fetch_response(msgs) for msgs in batch_messages]
results = await asyncio.gather(*tasks)
for i, res in enumerate(results, 1):
print(f"Async response {i}: {res}")
asyncio.run(main()) output
Async response 1: Quantum computing uses quantum bits to perform complex calculations faster than classical computers. Async response 2: AI use cases include natural language processing, computer vision, and autonomous vehicles.
Troubleshooting
- If you get rate limit errors, reduce batch size or add retry logic with exponential backoff.
- Ensure your
messagesformat matches the expected chat format: a list of role-content dicts. - Check your API key environment variable is set correctly to avoid authentication errors.
Key Takeaways
- Batch multiple prompts in one API call to reduce latency and cost.
- Use asynchronous calls to handle large batches efficiently.
- Always format messages as lists of role-content dictionaries per prompt.