ContextWindowExceededError
anthropic.errors.ContextWindowExceededError
Stack trace
anthropic.errors.ContextWindowExceededError: Request tokens (input + output) exceed the model's maximum context window size of 9000 tokens.
Why it happens
Anthropic Claude models have a fixed maximum context window size (e.g., 9000 tokens). When the combined tokens of the prompt plus the requested completion exceed this limit, the API returns this error to prevent processing oversized inputs.
Detection
Monitor token usage by summing input prompt tokens and expected output tokens before sending requests; log token counts and catch ContextWindowExceededError exceptions to identify over-limit calls.
Causes & fixes
Input prompt plus requested completion tokens exceed Claude's max context window (e.g., 9000 tokens).
Reduce the input prompt length or lower the max_tokens parameter to ensure total tokens stay within the model's context window.
Repeatedly appending conversation history without truncation causes token count to grow beyond limits.
Implement conversation history truncation or summarization to keep prompt tokens under the context window limit.
Using a model variant with a smaller context window than expected (e.g., Claude 1 vs Claude 2).
Verify the model variant supports the desired context window size and switch to a larger context window model if needed.
Code: broken vs fixed
import os
from anthropic import Anthropic
client = Anthropic(api_key=os.environ['ANTHROPIC_API_KEY'])
response = client.messages.create(
system="",
model="claude-2",
messages=[{"role": "user", "content": "A very long prompt that exceeds the context window..."}],
max_tokens=5000 # This combined with prompt tokens exceeds limit
) # This line triggers ContextWindowExceededError import os
from anthropic import Anthropic
client = Anthropic(api_key=os.environ['ANTHROPIC_API_KEY'])
# Reduced prompt length and max_tokens to fit context window
response = client.messages.create(
system="",
model="claude-2",
messages=[{"role": "user", "content": "A shorter prompt that fits within the context window"}],
max_tokens=1000 # Reduced to avoid exceeding token limit
)
print(response.content) Workaround
Catch ContextWindowExceededError and programmatically truncate or summarize the prompt before retrying the request.
Prevention
Track token usage client-side by encoding prompts and counting tokens before requests; implement prompt truncation or summarization to guarantee requests never exceed the model's context window.