High severity intermediate · Fix: 2-5 min

ContextLengthExceededError

mistral.errors.ContextLengthExceededError

What this error means
Mistral model rejects requests when the combined prompt and completion tokens exceed its maximum context length.

Stack trace

traceback
mistral.errors.ContextLengthExceededError: The input plus completion tokens exceed the model's maximum context length of 8192 tokens.
QUICK FIX
Reduce prompt length or max_tokens so their sum is below Mistral's max context length (e.g., 8192 tokens).

Why it happens

Mistral models have a fixed maximum context length (token limit) that includes both prompt and expected completion tokens. When your input prompt plus the requested output length exceed this limit, the model raises this error to prevent processing an invalid request.

Detection

Monitor token usage before sending requests by tokenizing inputs and summing with expected completion tokens; log and alert if approaching or exceeding the model's max context length.

Causes & fixes

1

Input prompt is too long, leaving insufficient tokens for the completion.

✓ Fix

Truncate or summarize the prompt to reduce token count before sending to Mistral.

2

Requested max tokens for completion plus prompt tokens exceed model limit.

✓ Fix

Lower the max_tokens parameter in the generation call to fit within the model's context window.

3

Not accounting for token overhead from special tokens or system messages in the prompt.

✓ Fix

Include overhead tokens in your token count calculations and reduce prompt or max_tokens accordingly.

Code: broken vs fixed

Broken - triggers the error
python
import os
from mistral import Mistral

client = Mistral(api_key=os.environ['MISTRAL_API_KEY'])

prompt = 'A' * 8000  # Very long prompt
response = client.generate(prompt=prompt, max_tokens=500)  # This line triggers ContextLengthExceededError
print(response.text)
Fixed - works correctly
python
import os
from mistral import Mistral

client = Mistral(api_key=os.environ['MISTRAL_API_KEY'])

prompt = 'A' * 7500  # Reduced prompt length
max_tokens = 500  # Adjusted max tokens
response = client.generate(prompt=prompt, max_tokens=max_tokens)  # Fixed: total tokens within limit
print(response.text)  # Works without error
Reduced prompt length and ensured max_tokens plus prompt tokens do not exceed Mistral's max context length, preventing the error.

Workaround

Catch ContextLengthExceededError and programmatically truncate the prompt or reduce max_tokens, then retry the request automatically.

Prevention

Implement token counting before requests and enforce limits on prompt and max_tokens to never exceed Mistral's context window, using tokenizer libraries compatible with Mistral.

Python 3.9+ · mistral >=1.0.0 · tested on 1.0.x
Verified 2026-04 · mistral-1.0, mistral-1.0-pro
Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.