How to Beginner to Intermediate · 3 min read

How to batch API calls to reduce costs

Quick answer

Batch API calls by combining multiple prompts or requests into a single call using the chat.completions.create method with an array of messages or inputs. This reduces per-call overhead and lowers costs by maximizing token usage per request.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the OpenAI Python SDK and set your API key as an environment variable to authenticate requests.

bash

pip install openai>=1.0

Step by step

Use the chat.completions.create method to send multiple prompts in one API call by passing a list of messages. This example batches three user prompts into a single request.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

messages_batch = [
    {"role": "user", "content": "Translate 'Hello' to French."},
    {"role": "user", "content": "Summarize the benefits of AI."},
    {"role": "user", "content": "Generate a Python function to add two numbers."}
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages_batch
)

for i, choice in enumerate(response.choices):
    print(f"Response {i+1}:", choice.message.content)

output

Response 1: Bonjour
Response 2: AI improves efficiency, automates tasks, and enables new innovations.
Response 3: def add(a, b):\n    return a + b

Common variations

You can batch calls asynchronously or use different models like claude-3-5-sonnet-20241022. For streaming responses, batch requests carefully to avoid mixing streams.

python

import os
import asyncio
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def batch_async():
    messages_batch = [
        {"role": "user", "content": "Explain quantum computing."},
        {"role": "user", "content": "Write a haiku about spring."}
    ]
    response = await client.chat.completions.acreate(
        model="gpt-4o",
        messages=messages_batch
    )
    for i, choice in enumerate(response.choices):
        print(f"Async response {i+1}:", choice.message.content)

asyncio.run(batch_async())

output

Async response 1: Quantum computing uses qubits to perform complex calculations faster than classical computers.
Async response 2: Blossoms in the breeze,\nspring whispers through the green leaves,\nlife begins anew.

Troubleshooting

If you receive errors about token limits, reduce the number of batched messages or shorten each prompt. Also, ensure your environment variable OPENAI_API_KEY is set correctly.

✅

Key Takeaways

Batch multiple prompts in one API call to reduce per-call overhead and lower costs.
Use the messages parameter with a list of inputs in chat.completions.create for batching.
Monitor token usage to avoid exceeding model limits when batching.
Async batching can improve throughput but requires careful handling of responses.
Always secure your API key via environment variables to avoid leaks.

Verified 2026-04 · gpt-4o, claude-3-5-sonnet-20241022

Verify ↗