ContextLengthExceededError
openai.error.ContextLengthExceededError
Stack trace
openai.error.ContextLengthExceededError: The combined length of the messages exceeds the model's maximum context length. Please reduce the prompt size or use a model with a larger context window.
Why it happens
OpenAI models have a fixed maximum token limit for the entire conversation context, including system prompts, user messages, and completions. When the system prompt is too long, it causes the total token count to exceed this limit, triggering the ContextLengthExceededError.
Detection
Monitor token usage before sending requests by encoding prompts with tiktoken or similar tokenizers and assert the total tokens do not exceed the model's max context length.
Causes & fixes
System prompt text is excessively long, consuming most of the model's context window.
Shorten or simplify the system prompt to reduce token usage, focusing on essential instructions only.
Accumulated conversation history plus system prompt exceeds the model's token limit.
Implement conversation history truncation or summarization to keep total tokens within limits.
Using a model with a small maximum context window for a large prompt.
Switch to a model with a larger context window, such as gpt-4o or gemini-2.5-pro.
Code: broken vs fixed
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])
system_prompt = """Very long system prompt text that exceeds the model's context window..."""
user_message = "Hello, how are you?"
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_message}
]
) # This line triggers ContextLengthExceededError
print(response.choices[0].message.content) import os
from openai import OpenAI
client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])
# Shortened system prompt to fit context window
system_prompt = "Please answer concisely and clearly."
user_message = "Hello, how are you?"
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_message}
]
) # Fixed: system prompt shortened to avoid overflow
print(response.choices[0].message.content) Workaround
Catch ContextLengthExceededError and programmatically truncate or summarize the system prompt or conversation history before retrying the request.
Prevention
Use token counting libraries like tiktoken to monitor prompt length dynamically and enforce limits before sending requests to the API.