OpenAIError
openai.OpenAIError (embedding batch size too large)
Stack trace
openai.OpenAIError: The batch size for embeddings is too large. Reduce the number of inputs per request and try again.
Why it happens
Embedding endpoints have strict limits on the number of input texts processed per request. Sending too many texts in one batch exceeds these limits, triggering an API error. This protects the service from overload and ensures consistent latency.
Detection
Monitor API error responses for OpenAIError messages indicating batch size limits exceeded. Log batch sizes before sending embedding requests to catch oversized batches early.
Causes & fixes
Sending too many texts in a single embedding request exceeding API batch size limits
Split the input texts into smaller batches within the allowed size before calling the embedding API.
Not respecting model-specific maximum tokens or input length constraints in batch
Check and enforce model documentation limits on tokens and input length per batch, adjusting batch size accordingly.
Using a generic batch size without dynamically adjusting for input length or API feedback
Implement dynamic batching logic that adapts batch size based on input length and API error responses.
Code: broken vs fixed
from openai import OpenAI
import os
client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])
texts = ["text1", "text2", ..., "text100"] # Large batch
# This line triggers the batch size too large error
response = client.embeddings.create(model="text-embedding-3-large", input=texts)
print(response) from openai import OpenAI
import os
client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])
texts = ["text1", "text2", ..., "text100"] # Large batch
# Split texts into smaller batches to avoid batch size error
batch_size = 16
responses = []
for i in range(0, len(texts), batch_size):
batch = texts[i:i+batch_size]
response = client.embeddings.create(model="text-embedding-3-large", input=batch)
responses.extend(response.data)
print(responses) # Now works without batch size error Workaround
Catch the OpenAIError exception, then automatically retry the embedding request with smaller batch sizes until successful.
Prevention
Implement batching logic that respects the documented maximum batch size and input length limits for embedding models, and monitor API error messages to adjust dynamically.