OpenAIError
openai.OpenAIError (context length exceeded max tokens)
Stack trace
openai.OpenAIError: This model's maximum context length is 8192 tokens, however you requested 9000 tokens (input tokens plus completion tokens). Please reduce your prompt or completion length.
Why it happens
OpenAI models have a fixed maximum context window size (token limit) that includes both prompt and completion tokens. When the total tokens exceed this limit, the API returns this error. This often happens when prompts are too long or when the max_tokens parameter is set too high.
Detection
Monitor token usage by counting tokens in your prompt plus max_tokens before sending requests. Use OpenAI tokenizer tools or SDK utilities to estimate token counts and prevent exceeding limits.
Causes & fixes
Prompt text is too long and exceeds the model's maximum context window when combined with max_tokens.
Shorten the prompt by removing unnecessary text or summarizing content to fit within the token limit.
max_tokens parameter is set too high, causing total tokens to exceed the model limit.
Reduce the max_tokens parameter to ensure the sum of prompt tokens and max_tokens stays within the model's context window.
Using a model with a smaller context window than required for your use case.
Switch to a model with a larger context window, such as gpt-4o or gpt-4o-mini, which support more tokens.
Code: broken vs fixed
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])
response = client.chat.completions.create(
model='gpt-4o',
messages=[{'role': 'user', 'content': 'A' * 8000}], # Very long prompt
max_tokens=2000 # Too large, causes context length exceeded error
) # This line triggers the error import os
from openai import OpenAI
client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])
# Reduced prompt length and max_tokens to fit context window
response = client.chat.completions.create(
model='gpt-4o',
messages=[{'role': 'user', 'content': 'A' * 6000}], # Shortened prompt
max_tokens=1000 # Reduced max_tokens
)
print(response.choices[0].message.content) # Works without error Workaround
Catch OpenAIError exceptions, then programmatically truncate or summarize the prompt and retry the request with fewer tokens.
Prevention
Implement token counting before requests using OpenAI tokenizer utilities and enforce limits on prompt and max_tokens to never exceed the model's context window.