Fix LiteLLM context length exceeded error
LiteLLM context length exceeded error occurs when the input prompt plus model context exceeds the model's maximum token limit. Fix it by truncating or chunking your input text to fit within the model's context window or by selecting a model with a larger context size.config_error LiteLLM.Why this happens
The LiteLLM context length exceeded error is triggered when the combined token length of your input prompt and any prior context exceeds the model's maximum context window size. For example, if you use a model with a 2048-token limit but your prompt plus conversation history totals 2500 tokens, the error occurs.
Typical triggering code looks like this:
from litellm import LLM
llm = LLM(model="some-model")
prompt = """Very long input text exceeding context length..."""
response = llm(prompt)
The error output usually states something like Context length exceeded: max 2048 tokens.
from litellm import LLM
llm = LLM(model="some-model")
prompt = """A very long input text that exceeds the model's maximum context length..."""
response = llm(prompt) # Raises context length exceeded error litellm.errors.ContextLengthExceeded: Input tokens (2500) exceed max context length (2048)
The fix
To fix this, truncate or chunk your input so it fits within the model's context window. You can use a tokenizer to count tokens and trim the prompt accordingly. Alternatively, select a model with a larger context size if available.
Example corrected code truncates the prompt to 2048 tokens before calling LiteLLM:
from litellm import LLM
from litellm.tokenizer import Tokenizer
llm = LLM(model="some-model")
tokenizer = Tokenizer(model="some-model")
prompt = """A very long input text that exceeds the model's maximum context length..."""
tokens = tokenizer.encode(prompt)
max_context = 2048
if len(tokens) > max_context:
tokens = tokens[:max_context]
prompt = tokenizer.decode(tokens)
response = llm(prompt)
print(response) Generated response text from LiteLLM
Preventing it in production
Implement input validation to check token length before sending requests. Use chunking strategies to split large documents into smaller segments. Add retry logic or fallback to models with larger context windows when needed. Monitoring token usage helps avoid unexpected errors.
Key Takeaways
- Always check and truncate input tokens to fit the model's max context length before calling LiteLLM.
- Use tokenizers to accurately measure prompt length in tokens, not characters.
- Implement chunking and input validation to prevent context length errors in production.