ContextLengthExceededError
cerebras.client.errors.ContextLengthExceededError
Stack trace
Traceback (most recent call last):
File "app.py", line 42, in <module>
response = client.chat.completions.create(model="cerebras-gpt-13b", messages=messages)
File "/usr/local/lib/python3.9/site-packages/cerebras/client/chat.py", line 88, in create
raise ContextLengthExceededError("Input prompt exceeds maximum context length")
cerebras.client.errors.ContextLengthExceededError: Input prompt exceeds maximum context length Why it happens
Cerebras models have a fixed maximum context length (number of tokens) they can process in a single request. When the combined tokens of the prompt and conversation history exceed this limit, the SDK raises ContextLengthExceededError to prevent invalid requests.
Detection
Monitor token counts before sending requests by using Cerebras SDK token counting utilities or manually truncating messages to stay within the model's max context length.
Causes & fixes
The input messages or prompt exceed the Cerebras model's maximum token context length.
Truncate or shorten the prompt and conversation history to fit within the model's documented max token limit before calling the API.
Accumulated conversation history grows too large over multiple turns without pruning.
Implement conversation history management by removing or summarizing older messages to keep total tokens under the limit.
Using a model variant with a smaller context window than expected.
Verify the model's max context length in the Cerebras documentation and switch to a model variant with a larger context window if needed.
Code: broken vs fixed
from cerebras import Client
import os
client = Client(api_key=os.environ["CEREBRAS_API_KEY"])
messages = [
{"role": "user", "content": """" + "A" * 50000 + """} # Too long prompt triggers error
]
response = client.chat.completions.create(model="cerebras-gpt-13b", messages=messages) # Raises ContextLengthExceededError
print(response.choices[0].message.content) from cerebras import Client
import os
client = Client(api_key=os.environ["CEREBRAS_API_KEY"])
# Truncate prompt to max allowed tokens (e.g., 8192 tokens)
max_tokens = 8192
prompt_text = """" + "A" * 50000 + """ # Original prompt
truncated_prompt = prompt_text[:max_tokens * 4] # Approximate truncation by characters
messages = [
{"role": "user", "content": truncated_prompt}
]
response = client.chat.completions.create(model="cerebras-gpt-13b", messages=messages) # Fixed: prompt truncated
print(response.choices[0].message.content) # Prints model output Workaround
Catch ContextLengthExceededError and programmatically truncate or summarize the input prompt before retrying the request to avoid crashes.
Prevention
Implement token counting and prompt length checks in your application logic to ensure inputs never exceed the Cerebras model's documented max context length before API calls.