RateLimitError
groq.RateLimitError (HTTP 429)
Stack trace
groq.RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit exceeded: tokens per minute quota reached', 'type': 'rate_limit', 'code': 'rate_limit_exceeded'}} Why it happens
Groq enforces a strict tokens per minute quota to protect service stability. When your application sends requests that cumulatively exceed this token limit within one minute, the API responds with a 429 RateLimitError to throttle usage.
Detection
Monitor your token usage metrics and catch groq.RateLimitError exceptions to detect when your application exceeds the tokens per minute quota before it causes failures.
Causes & fixes
Sending too many tokens in requests within a short time frame exceeding the tokens per minute quota.
Implement request pacing or batching to keep token usage under the Groq tokens per minute limit.
Parallel or concurrent requests cumulatively exceeding the token rate limit.
Serialize or limit concurrency of requests to Groq API to avoid bursts that exceed token quotas.
Using a high token limit per request without adjusting the request frequency accordingly.
Reduce max tokens per request or increase delay between requests to stay within the tokens per minute allowance.
Code: broken vs fixed
from groq import GroqClient
client = GroqClient(api_key='my_api_key')
response = client.generate(prompt='Hello world', max_tokens=1000) # triggers RateLimitError
print(response) import os
from groq import GroqClient, RateLimitError
client = GroqClient(api_key=os.environ['GROQ_API_KEY'])
try:
response = client.generate(prompt='Hello world', max_tokens=1000)
print(response)
except RateLimitError:
print('Rate limit exceeded, retrying after delay...')
# Implement retry with backoff here Workaround
Catch RateLimitError exceptions and implement a delay or exponential backoff before retrying requests to avoid immediate repeated failures.
Prevention
Track token usage per minute and implement client-side throttling or pacing to ensure requests stay within Groq's tokens per minute quota, preventing 429 errors.