How to Intermediate · 3 min read

How to implement retry logic for AI API calls

Quick answer
Implement retry logic for AI API calls by wrapping the call in a loop that catches transient exceptions like timeouts or rate limits, then retries after a delay. Use exponential backoff with jitter to avoid overwhelming the API and ensure reliable responses from models like gpt-4o or claude-3-5-sonnet-20241022.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup

Install the OpenAI Python SDK and set your API key as an environment variable for secure authentication.

bash
pip install openai>=1.0

Step by step

This example shows how to implement retry logic with exponential backoff for OpenAI's gpt-4o model using the official SDK. It retries up to 3 times on transient errors like rate limits or network issues.

python
import os
import time
from openai import OpenAI
from openai import OpenAIError, RateLimitError, Timeout

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

MAX_RETRIES = 3


def call_ai_with_retry(prompt):
    delay = 1  # initial delay in seconds
    for attempt in range(1, MAX_RETRIES + 1):
        try:
            response = client.chat.completions.create(
                model="gpt-4o",
                messages=[{"role": "user", "content": prompt}]
            )
            return response.choices[0].message.content
        except (RateLimitError, Timeout, OpenAIError) as e:
            if attempt == MAX_RETRIES:
                raise
            print(f"Attempt {attempt} failed: {e}. Retrying in {delay} seconds...")
            time.sleep(delay)
            delay *= 2  # exponential backoff


if __name__ == "__main__":
    prompt = "Explain retry logic for AI API calls."
    result = call_ai_with_retry(prompt)
    print("AI response:", result)
output
AI response: Retry logic involves catching transient errors like rate limits or timeouts and retrying the request after a delay, often increasing the delay exponentially to avoid overwhelming the API.

Common variations

You can adapt retry logic for asynchronous calls, streaming responses, or other AI providers like Anthropic. For example, use asyncio.sleep() for async retries or adjust error types based on the SDK.

python
import os
import asyncio
import anthropic

client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

MAX_RETRIES = 3

async def call_anthropic_with_retry(prompt):
    delay = 1
    for attempt in range(1, MAX_RETRIES + 1):
        try:
            response = await client.messages.create(
                model="claude-3-5-sonnet-20241022",
                max_tokens=500,
                system="You are a helpful assistant.",
                messages=[{"role": "user", "content": prompt}]
            )
            return response.content[0].text
        except Exception as e:
            if attempt == MAX_RETRIES:
                raise
            print(f"Attempt {attempt} failed: {e}. Retrying in {delay} seconds...")
            await asyncio.sleep(delay)
            delay *= 2


if __name__ == "__main__":
    prompt = "Explain retry logic for AI API calls asynchronously."
    result = asyncio.run(call_anthropic_with_retry(prompt))
    print("AI response:", result)
output
AI response: Retry logic for AI API calls asynchronously involves catching exceptions and retrying with increasing delays using async sleep to avoid blocking the event loop.

Troubleshooting

  • If you encounter persistent rate limit errors, increase the backoff delay or reduce request frequency.
  • For network timeouts, verify your internet connection and consider increasing timeout settings if supported.
  • Log errors and retry attempts to diagnose issues effectively.

Key Takeaways

  • Use exponential backoff with jitter to handle transient API errors gracefully.
  • Catch specific exceptions like rate limits and timeouts to trigger retries.
  • Adapt retry logic for async or streaming calls depending on your SDK and use case.
Verified 2026-04 · gpt-4o, claude-3-5-sonnet-20241022
Verify ↗