High severity HTTP 429 intermediate · Fix: 5-10 min

RateLimitError

cerebras.RateLimitError (HTTP 429)

What this error means
The Cerebras API rejected the request because the allowed request quota was exceeded, returning HTTP 429 RateLimitError.

Stack trace

traceback
cerebras.RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit exceeded', 'type': 'requests', 'code': 'rate_limit_exceeded'}}
QUICK FIX
Catch cerebras.RateLimitError and implement retries with exponential backoff and delay to avoid immediate repeated failures.

Why it happens

Cerebras enforces strict rate limits on API requests to prevent abuse and ensure fair usage. When your application sends more requests than allowed within a time window, the API returns a 429 RateLimitError to signal you must slow down.

Detection

Monitor API responses for HTTP 429 status codes or catch cerebras.RateLimitError exceptions to detect rate limit breaches before your app crashes.

Causes & fixes

1

Sending too many requests in a short period exceeding Cerebras API limits

✓ Fix

Implement exponential backoff and retry logic with delays between requests to stay within rate limits.

2

Parallel or concurrent requests exceeding the allowed quota

✓ Fix

Throttle concurrency by limiting the number of simultaneous API calls using a queue or semaphore.

3

Using an API key with low rate limits or shared usage causing quota exhaustion

✓ Fix

Request a higher rate limit from Cerebras support or use dedicated API keys per service to distribute load.

Code: broken vs fixed

Broken - triggers the error
python
from cerebras import CerebrasClient
client = CerebrasClient(api_key='hardcoded_key')
response = client.call_model('model-name', input_data)  # This line triggers RateLimitError
print(response)
Fixed - works correctly
python
import os
from cerebras import CerebrasClient, RateLimitError
import time

client = CerebrasClient(api_key=os.environ['CEREBRAS_API_KEY'])

try:
    response = client.call_model('model-name', input_data)
    print(response)
except RateLimitError:
    print('Rate limit exceeded, retrying after delay...')
    time.sleep(10)  # Wait before retrying
    response = client.call_model('model-name', input_data)
    print(response)  # Fixed by using env var and retry with delay
Replaced hardcoded API key with environment variable and added try/except to catch RateLimitError and retry after delay to respect rate limits.

Workaround

Wrap API calls in try/except RateLimitError, catch the exception, wait a fixed delay (e.g., 10 seconds), then retry the request to temporarily handle rate limits.

Prevention

Implement client-side rate limiting with exponential backoff and concurrency control, and request higher quotas from Cerebras if needed to avoid hitting limits.

Python 3.9+ · cerebras-sdk >=1.0.0 · tested on 1.2.0
Verified 2026-04
Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.