ContextWindowExceededError
litellm.errors.ContextWindowExceededError
Stack trace
litellm.errors.ContextWindowExceededError: Context window exceeded across providers: total tokens 8500 > max allowed 8192
File "/app/main.py", line 42, in generate
response = client.chat.completions.create(model='gpt-4o', messages=messages)
File "/usr/local/lib/python3.9/site-packages/litellm/client.py", line 120, in create
raise ContextWindowExceededError(f"Context window exceeded across providers: total tokens {total_tokens} > max allowed {max_tokens}") Why it happens
LiteLLM manages multiple LLM providers and enforces a combined token limit for context windows. When the sum of tokens from all providers in a single request exceeds the maximum allowed context window size, this error is raised to prevent API rejections or truncated responses.
Detection
Monitor token usage per request across all providers and assert that the total tokens do not exceed the smallest context window limit before sending the request.
Causes & fixes
Combined token count from multiple providers exceeds the smallest provider's max context window size.
Reduce the input prompt length or split the request to ensure total tokens across providers stay within the smallest context window limit.
Using multiple providers with different max context windows without accounting for the lowest limit.
Query each provider's max context window and enforce the strictest limit when aggregating tokens across providers.
Not truncating or summarizing long conversation history before sending to providers.
Implement prompt truncation or summarization to keep token count within limits before calling LiteLLM APIs.
Code: broken vs fixed
from litellm import LiteLLMClient
client = LiteLLMClient()
messages = [{'role': 'user', 'content': 'Very long conversation or document...'}]
response = client.chat.completions.create(model='gpt-4o', messages=messages) # Raises ContextWindowExceededError import os
from litellm import LiteLLMClient
client = LiteLLMClient()
messages = [{'role': 'user', 'content': 'Shortened or truncated conversation content...'}] # Reduced token count
response = client.chat.completions.create(model='gpt-4o', messages=messages) # Fixed: within context window
print(response) Workaround
Catch ContextWindowExceededError and programmatically truncate or summarize the input prompt before retrying the request.
Prevention
Implement token counting and prompt length checks before sending requests to LiteLLM, always respecting the smallest max context window size among all providers used.