OpenAIError
openai.OpenAIError (context length exceeded max tokens)
Stack trace
openai.OpenAIError: The request was rejected because it exceeded the maximum allowed tokens in the context window.
Why it happens
OpenAI models have a fixed maximum context length (token limit) that includes both the prompt and the generated completion. When the total tokens exceed this limit, the API rejects the request with this error. This often happens with very long prompts or when requesting large completions.
Detection
Monitor token usage by summing prompt tokens and expected completion tokens before sending requests; log token counts and catch OpenAIError exceptions to detect context length issues early.
Causes & fixes
Prompt text is too long and exceeds the model's maximum token limit when combined with the expected completion length.
Shorten the prompt by removing unnecessary text or summarizing content to reduce token count below the model's max context length.
The max_tokens parameter for completion is set too high, causing total tokens to exceed the model's limit.
Reduce the max_tokens parameter to ensure the sum of prompt tokens and max_tokens stays within the model's context window.
Using a model with a smaller context window than required for your prompt and completion size.
Switch to a model with a larger context length, such as gpt-4o or gpt-4o-mini, which support longer token limits.
Code: broken vs fixed
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])
response = client.chat.completions.create(
model='gpt-4o-mini',
messages=[{'role': 'user', 'content': 'A very long prompt that exceeds the token limit...'}],
max_tokens=2000 # This causes context length exceeded error
) # triggers OpenAIError
print(response.choices[0].message.content) import os
from openai import OpenAI
client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])
# Reduced prompt length and max_tokens to fit context window
short_prompt = 'A concise prompt that fits token limits.'
response = client.chat.completions.create(
model='gpt-4o-mini',
messages=[{'role': 'user', 'content': short_prompt}],
max_tokens=500 # Reduced max_tokens to avoid exceeding context length
)
print(response.choices[0].message.content) # fixed: no context length error Workaround
Catch the OpenAIError exception, then programmatically truncate or summarize the prompt and retry the request with fewer tokens.
Prevention
Implement token counting before requests using OpenAI tokenizer libraries or heuristics, and enforce limits on prompt and max_tokens parameters to never exceed model context length.