InvalidRequestError
openai.InvalidRequestError (embedding input too long)
Stack trace
openai.InvalidRequestError: This model's maximum context length is 8191 tokens, however you requested 9000 tokens. Please reduce the length of the input.
Why it happens
OpenAI embedding models have a strict maximum token limit for input text. When the input text is too long, the API rejects the request with an InvalidRequestError indicating the token count exceeded the model's limit. This prevents processing inputs that are too large to embed.
Detection
Before sending embedding requests, measure the token count of the input text using a tokenizer compatible with the embedding model and log or assert if it exceeds the limit.
Causes & fixes
Input text length exceeds the embedding model's maximum token limit
Truncate or split the input text into smaller chunks below the token limit before calling the embedding API.
Not accounting for tokenization differences causing underestimation of token count
Use the same tokenizer as the embedding model (e.g., tiktoken for OpenAI models) to accurately count tokens before sending.
Batching multiple texts into one input string exceeding token limit
Send each text separately or batch only within the token limit per request.
Code: broken vs fixed
from openai import OpenAI
client = OpenAI()
text = "A" * 9000 # Very long input exceeding token limit
response = client.embeddings.create(model="text-embedding-3-large", input=text) # This line raises InvalidRequestError
print(response) import os
from openai import OpenAI
import tiktoken
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
text = "A" * 9000 # Very long input
# Use tiktoken to count tokens accurately
tokenizer = tiktoken.encoding_for_model("text-embedding-3-large")
tokens = tokenizer.encode(text)
max_tokens = 8191
if len(tokens) > max_tokens:
# Truncate to max tokens
tokens = tokens[:max_tokens]
text = tokenizer.decode(tokens)
response = client.embeddings.create(model="text-embedding-3-large", input=text) # Fixed: input truncated
print(response) Workaround
Catch InvalidRequestError exceptions, then split the input text into smaller chunks under the token limit and retry embedding calls on each chunk separately.
Prevention
Integrate token counting with the exact tokenizer used by the embedding model in your preprocessing pipeline to enforce input length limits before API calls.