OpenAIError
openai.OpenAIError (embedding batch size too large)
Stack trace
openai.OpenAIError: The request batch size is too large. Maximum allowed batch size is 2048 tokens or 2048 items depending on the model.
Why it happens
The embeddings API enforces strict limits on the number of input texts or tokens processed in a single batch request. Sending too many inputs at once exceeds these limits, triggering an API error to protect service stability.
Detection
Monitor API error responses for OpenAIError messages indicating batch size limits exceeded, and log batch sizes before sending requests to catch oversize batches early.
Causes & fixes
Sending more input texts in one embeddings request than the model's maximum batch size limit.
Split your input list into smaller batches that respect the model's documented maximum batch size before calling the embeddings API.
Ignoring token count limits per batch when inputs are very long texts.
Calculate the total token count of inputs in a batch and reduce batch size if the token count exceeds the model's maximum allowed tokens per request.
Using an outdated or incorrect batch size assumption not aligned with the current API limits.
Check the latest OpenAI embeddings API documentation for current batch size limits and update your batching logic accordingly.
Code: broken vs fixed
from openai import OpenAI
import os
client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])
inputs = ['text1', 'text2', ..., 'text5000'] # Too large batch
response = client.embeddings.create(model='text-embedding-3-large', input=inputs) # This line triggers the error from openai import OpenAI
import os
client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])
inputs = ['text1', 'text2', ..., 'text5000']
# Split inputs into batches of max 1000 to avoid batch size error
batch_size = 1000
embeddings = []
for i in range(0, len(inputs), batch_size):
batch = inputs[i:i+batch_size]
response = client.embeddings.create(model='text-embedding-3-large', input=batch) # Fixed batch size
embeddings.extend(response.data)
print(f"Processed {len(embeddings)} embeddings successfully.") Workaround
Catch the OpenAIError exception, then retry the embeddings request with smaller batches by slicing the input list dynamically.
Prevention
Implement automatic batching in your embedding pipeline that respects the model's documented batch size and token limits, and monitor API error responses to adjust batch sizes proactively.