ContextWindowTooLongError
anthropic.errors.ContextWindowTooLongError
Stack trace
anthropic.errors.ContextWindowTooLongError: The prompt plus completion tokens exceed the model's maximum context window size of 9000 tokens.
Why it happens
Anthropic models have a fixed maximum context window size (e.g., 9000 tokens). When the total tokens in the prompt plus the requested completion length exceed this limit, the API throws this error to prevent processing oversized inputs.
Detection
Monitor token usage by counting tokens in your prompt and requested completion length before sending requests; log and alert when approaching the model's context window limit.
Causes & fixes
Prompt text is too long and combined with requested completion exceeds the model's token limit.
Reduce prompt length by summarizing or chunking input text, or decrease the max_tokens parameter for completion.
Requesting a completion length (max_tokens) that is too large given the prompt size.
Lower the max_tokens parameter to ensure prompt tokens plus max_tokens fit within the model's context window.
Not accounting for token overhead from system messages or metadata in the prompt.
Include all tokens from system, user, and assistant messages in token count calculations to stay within limits.
Code: broken vs fixed
import os
from anthropic import Anthropic
client = Anthropic(api_key=os.environ['ANTHROPIC_API_KEY'])
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
system="You are a helpful assistant.",
messages=[{"role": "user", "content": "Very long prompt text exceeding context window..."}],
max_tokens=1000 # This line triggers ContextWindowTooLongError
)
print(response.content) import os
from anthropic import Anthropic
client = Anthropic(api_key=os.environ['ANTHROPIC_API_KEY'])
# Shortened prompt and reduced max_tokens to fit context window
short_prompt = "Summarized or chunked prompt text within token limits."
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
system="You are a helpful assistant.",
messages=[{"role": "user", "content": short_prompt}],
max_tokens=500 # Reduced to avoid context window overflow
)
print(response.content) # Fixed: no ContextWindowTooLongError Workaround
Catch ContextWindowTooLongError, then programmatically truncate or chunk the prompt and retry the request with smaller max_tokens.
Prevention
Implement token counting before API calls and design prompts to stay well below the model's context window limit, using chunking or summarization as needed.