How to beginner · 3 min read

How to retry OpenAI API calls with backoff

Quick answer
Use Python's retry logic with exponential backoff by catching exceptions from client.chat.completions.create calls and retrying after increasing delays. Implement this with a loop or a decorator to handle transient errors like rate limits or network issues.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup

Install the official OpenAI Python SDK and set your API key as an environment variable.

bash
pip install openai>=1.0

Step by step

This example shows how to call the OpenAI gpt-4o model with retry logic using exponential backoff for transient errors like rate limits or network failures.

python
import os
import time
from openai import OpenAI
from openai import OpenAIError

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Retry parameters
max_retries = 5
initial_delay = 1  # seconds

messages = [{"role": "user", "content": "Hello, retry with backoff!"}]

for attempt in range(max_retries):
    try:
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages
        )
        print("Response:", response.choices[0].message.content)
        break  # Success, exit loop
    except OpenAIError as e:
        print(f"Attempt {attempt + 1} failed: {e}")
        if attempt == max_retries - 1:
            print("Max retries reached. Exiting.")
            raise
        sleep_time = initial_delay * (2 ** attempt)  # Exponential backoff
        print(f"Retrying in {sleep_time} seconds...")
        time.sleep(sleep_time)
output
Response: Hello, retry with backoff!

Common variations

  • Use asyncio with async functions and asyncio.sleep for asynchronous retries.
  • Adjust max_retries and initial_delay based on your app's tolerance for latency.
  • Use different models like gpt-4o-mini by changing the model parameter.
  • Wrap retry logic in a reusable decorator or function for cleaner code.
python
import asyncio
import os
from openai import OpenAI
from openai import OpenAIError

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def call_openai_with_retry(messages, max_retries=5, initial_delay=1):
    for attempt in range(max_retries):
        try:
            response = await client.chat.completions.acreate(
                model="gpt-4o",
                messages=messages
            )
            return response.choices[0].message.content
        except OpenAIError as e:
            print(f"Attempt {attempt + 1} failed: {e}")
            if attempt == max_retries - 1:
                raise
            sleep_time = initial_delay * (2 ** attempt)
            print(f"Retrying in {sleep_time} seconds...")
            await asyncio.sleep(sleep_time)

# Usage example
# asyncio.run(call_openai_with_retry([{"role": "user", "content": "Hello async retry!"}]))

Troubleshooting

  • If you get persistent RateLimitError, reduce request frequency or increase backoff delay.
  • For Timeout errors, ensure network stability and consider increasing timeout settings if available.
  • Catch OpenAIError broadly to handle all API exceptions gracefully.

Key Takeaways

  • Use exponential backoff with increasing delays to handle transient OpenAI API errors robustly.
  • Catch OpenAIError exceptions to detect failures and trigger retries.
  • Adjust retry parameters based on your application's latency tolerance and error patterns.
Verified 2026-04 · gpt-4o, gpt-4o-mini
Verify ↗