How to beginner · 3 min read

How to batch LLM requests in Python

Quick answer

Batch multiple LLM requests in Python by sending them as a list of messages in a single API call using the chat.completions.create method with the OpenAI SDK. This reduces overhead and cost by consolidating calls instead of sending them individually.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the official openai Python package (v1+) and set your API key as an environment variable for secure authentication.

bash

pip install openai>=1.0

output

Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

Use the OpenAI client to send a batch of prompts as separate messages in a single chat.completions.create call. Each message represents one request, and the response contains a list of completions matching the batch order.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Prepare batch messages
batch_messages = [
    {"role": "user", "content": "Translate 'Hello' to French."},
    {"role": "user", "content": "Summarize the benefits of AI."},
    {"role": "user", "content": "Write a haiku about spring."}
]

# Send batch request
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=batch_messages
)

# Extract and print each completion
for i, choice in enumerate(response.choices):
    print(f"Response {i+1}:", choice.message.content)

output

Response 1: Bonjour
Response 2: AI improves efficiency, automates tasks, and enables new insights.
Response 3: Blossoms gently fall,
Spring whispers in soft breezes,
Nature's poem blooms.

Common variations

Async requests: Use async client methods to batch requests concurrently for better throughput.
Streaming: Batch streaming is not supported; stream each request individually.
Different models: Change the model parameter to any supported model like gpt-4o-mini or claude-3-5-sonnet-20241022.
SDK alternatives: Anthropic and other providers support similar batching by sending multiple messages in one call.

python

import asyncio
import os
from openai import OpenAI

async def batch_async():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    batch_messages = [
        {"role": "user", "content": "Explain recursion."},
        {"role": "user", "content": "What is RAG in AI?"}
    ]
    response = await client.chat.completions.acreate(
        model="gpt-4o-mini",
        messages=batch_messages
    )
    for i, choice in enumerate(response.choices):
        print(f"Async response {i+1}:", choice.message.content)

asyncio.run(batch_async())

output

Async response 1: Recursion is a programming technique where a function calls itself to solve smaller instances of a problem.
Async response 2: RAG stands for Retrieval-Augmented Generation, combining retrieval of documents with LLM generation.

Troubleshooting

If you receive a 400 Bad Request, ensure your messages parameter is a list of valid message objects with role and content.
If the batch size is too large, you may hit token or request size limits; split batches accordingly.
Check your API key environment variable is set correctly to avoid authentication errors.

Key Takeaways

Batching multiple prompts in one chat.completions.create call reduces API overhead and cost.
Each message in the batch corresponds to one request and one completion in the response.
Use async calls for concurrent batch processing to improve throughput.
Respect token and request size limits by controlling batch size.
Always validate message format and environment variables to avoid common errors.

Verified 2026-04 · gpt-4o-mini, claude-3-5-sonnet-20241022

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.