ContextLengthExceededError
openai.ContextLengthExceededError
Stack trace
openai.ContextLengthExceededError: The input tokens length exceeds the model's maximum context window size of 8192 tokens.
Why it happens
OpenAI models have a fixed maximum context window size (token limit). When a document's tokenized length exceeds this limit, the API rejects the request with a ContextLengthExceededError. This often happens with long documents or concatenated inputs without chunking or summarization.
Detection
Monitor token counts before sending requests using tokenizer utilities. Log input token lengths and catch ContextLengthExceededError exceptions to detect when inputs are too large.
Causes & fixes
Input document text is longer than the model's maximum token context window.
Split the document into smaller chunks that fit within the model's token limit before summarization.
Concatenating multiple documents or prompt instructions without accounting for token limits.
Calculate total tokens including prompt and document, then truncate or chunk inputs to stay within limits.
Using a model with a smaller context window than required for the document size.
Switch to a model with a larger context window, such as gpt-4o or gemini-2.5-pro, that supports longer inputs.
Code: broken vs fixed
from openai import OpenAI
import os
client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])
long_text = """Very long document text exceeding context window..."""
# This line triggers ContextLengthExceededError
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": long_text}]
)
print(response.choices[0].message.content) from openai import OpenAI
import os
client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])
def chunk_text(text, max_tokens=4000):
# Simple chunking by splitting text into smaller parts
# In production, use a tokenizer to count tokens precisely
return [text[i:i+max_tokens] for i in range(0, len(text), max_tokens)]
long_text = """Very long document text exceeding context window..."""
chunks = chunk_text(long_text, max_tokens=4000)
summaries = []
for chunk in chunks:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": chunk}]
)
summaries.append(response.choices[0].message.content)
final_summary = "\n".join(summaries)
print(final_summary) # Fixed: chunked input to avoid context length error Workaround
Catch ContextLengthExceededError and fallback to chunking the input text manually, then summarize each chunk separately before combining results.
Prevention
Implement token counting and chunking logic before sending requests. Prefer models with larger context windows for long documents and use streaming or hierarchical summarization techniques.