High severity intermediate · Fix: 2-5 min

ContextLengthExceededError

cerebras.client.errors.ContextLengthExceededError

What this error means
The Cerebras SDK throws ContextLengthExceededError when the input prompt exceeds the model's maximum allowed token context length.

Stack trace

traceback
Traceback (most recent call last):
  File "app.py", line 42, in <module>
    response = client.chat.completions.create(model="cerebras-gpt-13b", messages=messages)
  File "/usr/local/lib/python3.9/site-packages/cerebras/client/chat.py", line 88, in create
    raise ContextLengthExceededError("Input prompt exceeds maximum context length")
cerebras.client.errors.ContextLengthExceededError: Input prompt exceeds maximum context length
QUICK FIX
Truncate your input prompt and conversation history to fit within the Cerebras model's max token context length before sending the request.

Why it happens

Cerebras models have a fixed maximum context length (number of tokens) they can process in a single request. When the combined tokens of the prompt and conversation history exceed this limit, the SDK raises ContextLengthExceededError to prevent invalid requests.

Detection

Monitor token counts before sending requests by using Cerebras SDK token counting utilities or manually truncating messages to stay within the model's max context length.

Causes & fixes

1

The input messages or prompt exceed the Cerebras model's maximum token context length.

✓ Fix

Truncate or shorten the prompt and conversation history to fit within the model's documented max token limit before calling the API.

2

Accumulated conversation history grows too large over multiple turns without pruning.

✓ Fix

Implement conversation history management by removing or summarizing older messages to keep total tokens under the limit.

3

Using a model variant with a smaller context window than expected.

✓ Fix

Verify the model's max context length in the Cerebras documentation and switch to a model variant with a larger context window if needed.

Code: broken vs fixed

Broken - triggers the error
python
from cerebras import Client
import os

client = Client(api_key=os.environ["CEREBRAS_API_KEY"])

messages = [
    {"role": "user", "content": """" + "A" * 50000 + """}  # Too long prompt triggers error
]

response = client.chat.completions.create(model="cerebras-gpt-13b", messages=messages)  # Raises ContextLengthExceededError
print(response.choices[0].message.content)
Fixed - works correctly
python
from cerebras import Client
import os

client = Client(api_key=os.environ["CEREBRAS_API_KEY"])

# Truncate prompt to max allowed tokens (e.g., 8192 tokens)
max_tokens = 8192
prompt_text = """" + "A" * 50000 + """  # Original prompt
truncated_prompt = prompt_text[:max_tokens * 4]  # Approximate truncation by characters

messages = [
    {"role": "user", "content": truncated_prompt}
]

response = client.chat.completions.create(model="cerebras-gpt-13b", messages=messages)  # Fixed: prompt truncated
print(response.choices[0].message.content)  # Prints model output
Truncated the input prompt to fit within the Cerebras model's maximum context length to prevent the ContextLengthExceededError.

Workaround

Catch ContextLengthExceededError and programmatically truncate or summarize the input prompt before retrying the request to avoid crashes.

Prevention

Implement token counting and prompt length checks in your application logic to ensure inputs never exceed the Cerebras model's documented max context length before API calls.

Python 3.9+ · cerebras-sdk >=1.0.0 · tested on 1.2.3
Verified 2026-04
Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.