ContextLengthExceededError
mistral.errors.ContextLengthExceededError
Stack trace
mistral.errors.ContextLengthExceededError: The input plus completion tokens exceed the model's maximum context length of 8192 tokens.
Why it happens
Mistral models have a fixed maximum context length (token limit) that includes both prompt and expected completion tokens. When your input prompt plus the requested output length exceed this limit, the model raises this error to prevent processing an invalid request.
Detection
Monitor token usage before sending requests by tokenizing inputs and summing with expected completion tokens; log and alert if approaching or exceeding the model's max context length.
Causes & fixes
Input prompt is too long, leaving insufficient tokens for the completion.
Truncate or summarize the prompt to reduce token count before sending to Mistral.
Requested max tokens for completion plus prompt tokens exceed model limit.
Lower the max_tokens parameter in the generation call to fit within the model's context window.
Not accounting for token overhead from special tokens or system messages in the prompt.
Include overhead tokens in your token count calculations and reduce prompt or max_tokens accordingly.
Code: broken vs fixed
import os
from mistral import Mistral
client = Mistral(api_key=os.environ['MISTRAL_API_KEY'])
prompt = 'A' * 8000 # Very long prompt
response = client.generate(prompt=prompt, max_tokens=500) # This line triggers ContextLengthExceededError
print(response.text) import os
from mistral import Mistral
client = Mistral(api_key=os.environ['MISTRAL_API_KEY'])
prompt = 'A' * 7500 # Reduced prompt length
max_tokens = 500 # Adjusted max tokens
response = client.generate(prompt=prompt, max_tokens=max_tokens) # Fixed: total tokens within limit
print(response.text) # Works without error Workaround
Catch ContextLengthExceededError and programmatically truncate the prompt or reduce max_tokens, then retry the request automatically.
Prevention
Implement token counting before requests and enforce limits on prompt and max_tokens to never exceed Mistral's context window, using tokenizer libraries compatible with Mistral.