How to batch API calls to reduce costs
Quick answer
Batch API calls by combining multiple prompts or requests into a single call using the
chat.completions.create method with an array of messages or inputs. This reduces per-call overhead and lowers costs by maximizing token usage per request.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0
Setup
Install the OpenAI Python SDK and set your API key as an environment variable to authenticate requests.
pip install openai>=1.0 Step by step
Use the chat.completions.create method to send multiple prompts in one API call by passing a list of messages. This example batches three user prompts into a single request.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
messages_batch = [
{"role": "user", "content": "Translate 'Hello' to French."},
{"role": "user", "content": "Summarize the benefits of AI."},
{"role": "user", "content": "Generate a Python function to add two numbers."}
]
response = client.chat.completions.create(
model="gpt-4o",
messages=messages_batch
)
for i, choice in enumerate(response.choices):
print(f"Response {i+1}:", choice.message.content) output
Response 1: Bonjour Response 2: AI improves efficiency, automates tasks, and enables new innovations. Response 3: def add(a, b):\n return a + b
Common variations
You can batch calls asynchronously or use different models like claude-3-5-sonnet-20241022. For streaming responses, batch requests carefully to avoid mixing streams.
import os
import asyncio
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
async def batch_async():
messages_batch = [
{"role": "user", "content": "Explain quantum computing."},
{"role": "user", "content": "Write a haiku about spring."}
]
response = await client.chat.completions.acreate(
model="gpt-4o",
messages=messages_batch
)
for i, choice in enumerate(response.choices):
print(f"Async response {i+1}:", choice.message.content)
asyncio.run(batch_async()) output
Async response 1: Quantum computing uses qubits to perform complex calculations faster than classical computers. Async response 2: Blossoms in the breeze,\nspring whispers through the green leaves,\nlife begins anew.
Troubleshooting
If you receive errors about token limits, reduce the number of batched messages or shorten each prompt. Also, ensure your environment variable OPENAI_API_KEY is set correctly.
Key Takeaways
- Batch multiple prompts in one API call to reduce per-call overhead and lower costs.
- Use the
messagesparameter with a list of inputs inchat.completions.createfor batching. - Monitor token usage to avoid exceeding model limits when batching.
- Async batching can improve throughput but requires careful handling of responses.
- Always secure your API key via environment variables to avoid leaks.