How to implement retry logic for AI API calls
Quick answer
Implement retry logic for AI API calls by wrapping the call in a loop that catches transient exceptions like timeouts or rate limits, then retries after a delay. Use exponential backoff with jitter to avoid overwhelming the API and ensure reliable responses from models like
gpt-4o or claude-3-5-sonnet-20241022.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0
Setup
Install the OpenAI Python SDK and set your API key as an environment variable for secure authentication.
pip install openai>=1.0 Step by step
This example shows how to implement retry logic with exponential backoff for OpenAI's gpt-4o model using the official SDK. It retries up to 3 times on transient errors like rate limits or network issues.
import os
import time
from openai import OpenAI
from openai import OpenAIError, RateLimitError, Timeout
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
MAX_RETRIES = 3
def call_ai_with_retry(prompt):
delay = 1 # initial delay in seconds
for attempt in range(1, MAX_RETRIES + 1):
try:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.content
except (RateLimitError, Timeout, OpenAIError) as e:
if attempt == MAX_RETRIES:
raise
print(f"Attempt {attempt} failed: {e}. Retrying in {delay} seconds...")
time.sleep(delay)
delay *= 2 # exponential backoff
if __name__ == "__main__":
prompt = "Explain retry logic for AI API calls."
result = call_ai_with_retry(prompt)
print("AI response:", result) output
AI response: Retry logic involves catching transient errors like rate limits or timeouts and retrying the request after a delay, often increasing the delay exponentially to avoid overwhelming the API.
Common variations
You can adapt retry logic for asynchronous calls, streaming responses, or other AI providers like Anthropic. For example, use asyncio.sleep() for async retries or adjust error types based on the SDK.
import os
import asyncio
import anthropic
client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
MAX_RETRIES = 3
async def call_anthropic_with_retry(prompt):
delay = 1
for attempt in range(1, MAX_RETRIES + 1):
try:
response = await client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=500,
system="You are a helpful assistant.",
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text
except Exception as e:
if attempt == MAX_RETRIES:
raise
print(f"Attempt {attempt} failed: {e}. Retrying in {delay} seconds...")
await asyncio.sleep(delay)
delay *= 2
if __name__ == "__main__":
prompt = "Explain retry logic for AI API calls asynchronously."
result = asyncio.run(call_anthropic_with_retry(prompt))
print("AI response:", result) output
AI response: Retry logic for AI API calls asynchronously involves catching exceptions and retrying with increasing delays using async sleep to avoid blocking the event loop.
Troubleshooting
- If you encounter persistent rate limit errors, increase the backoff delay or reduce request frequency.
- For network timeouts, verify your internet connection and consider increasing timeout settings if supported.
- Log errors and retry attempts to diagnose issues effectively.
Key Takeaways
- Use exponential backoff with jitter to handle transient API errors gracefully.
- Catch specific exceptions like rate limits and timeouts to trigger retries.
- Adapt retry logic for async or streaming calls depending on your SDK and use case.