Debug Fix Intermediate · 3 min read

How to handle LLM timeouts gracefully

Quick answer
Handle LLM timeouts gracefully by implementing retry logic with exponential backoff around your API calls to catch TimeoutError or RateLimitError. Use error handling to fallback or notify users without crashing your app.
ERROR TYPE api_error
⚡ QUICK FIX
Add exponential backoff retry logic around your API call to handle TimeoutError automatically.

Why this happens

LLM timeouts occur when the API request exceeds the server's response time limit or network delays happen. This can be triggered by large prompt sizes, high server load, or slow network connections. The typical error looks like TimeoutError or RateLimitError returned from the API.

Example of broken code without timeout handling:

python
from openai import OpenAI
import os

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Generate a long text..."}]
)
print(response.choices[0].message.content)
output
TimeoutError: The request timed out after 60 seconds

The fix

Wrap your API call in a retry loop with exponential backoff to handle transient timeouts. This retries the request after increasing delays, reducing load and giving the server time to recover. Catch TimeoutError and RateLimitError to retry or fallback gracefully.

python
import time
from openai import OpenAI
import os

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

max_retries = 5
retry_delay = 1  # seconds

for attempt in range(max_retries):
    try:
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": "Generate a long text..."}]
        )
        print(response.choices[0].message.content)
        break
    except Exception as e:
        if attempt == max_retries - 1:
            print(f"Failed after {max_retries} attempts: {e}")
            # Optional: fallback or notify user here
        else:
            time.sleep(retry_delay * 2 ** attempt)  # exponential backoff
output
Long generated text output here...

Preventing it in production

  • Implement retries with exponential backoff and jitter to avoid thundering herd problems.
  • Validate prompt size and complexity to keep requests efficient.
  • Use circuit breakers or fallback models to maintain user experience during outages.
  • Monitor API latency and error rates to proactively adjust retry policies.

Key Takeaways

  • Use exponential backoff retry logic to handle transient LLM timeouts effectively.
  • Catch and handle specific API errors like TimeoutError and RateLimitError to maintain app stability.
  • Validate prompt size and monitor API usage to prevent frequent timeouts.
  • Implement fallback strategies to ensure graceful degradation during persistent failures.
Verified 2026-04 · gpt-4o
Verify ↗