Debug Fix Intermediate · 3 min read

How to handle LLM timeouts gracefully

Q: How to handle LLM timeouts gracefully

Handle LLM timeouts gracefully by implementing retry logic with exponential backoff around your API calls to catch TimeoutError or RateLimitError. Use error handling to fallback or notify users without crashing your app.

Quick answer

Handle LLM timeouts gracefully by implementing retry logic with exponential backoff around your API calls to catch TimeoutError or RateLimitError. Use error handling to fallback or notify users without crashing your app.

ERROR TYPE api_error

QUICK FIX

Add exponential backoff retry logic around your API call to handle TimeoutError automatically.

Why this happens

LLM timeouts occur when the API request exceeds the server's response time limit or network delays happen. This can be triggered by large prompt sizes, high server load, or slow network connections. The typical error looks like TimeoutError or RateLimitError returned from the API.

Example of broken code without timeout handling:

python

from openai import OpenAI
import os

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Generate a long text..."}]
)
print(response.choices[0].message.content)

output

TimeoutError: The request timed out after 60 seconds

The fix

Wrap your API call in a retry loop with exponential backoff to handle transient timeouts. This retries the request after increasing delays, reducing load and giving the server time to recover. Catch TimeoutError and RateLimitError to retry or fallback gracefully.

python

import time
from openai import OpenAI
import os

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

max_retries = 5
retry_delay = 1  # seconds

for attempt in range(max_retries):
    try:
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": "Generate a long text..."}]
        )
        print(response.choices[0].message.content)
        break
    except Exception as e:
        if attempt == max_retries - 1:
            print(f"Failed after {max_retries} attempts: {e}")
            # Optional: fallback or notify user here
        else:
            time.sleep(retry_delay * 2 ** attempt)  # exponential backoff

output

Long generated text output here...

Preventing it in production

Implement retries with exponential backoff and jitter to avoid thundering herd problems.
Validate prompt size and complexity to keep requests efficient.
Use circuit breakers or fallback models to maintain user experience during outages.
Monitor API latency and error rates to proactively adjust retry policies.

Related errors

Error	Cause	Quick fix
TimeoutError	API request took too long	Add retry with exponential backoff
RateLimitError	Too many requests in short time	Throttle requests and retry after delay
ConnectionError	Network issues	Retry with backoff and check network
APIKeyError	Invalid or missing API key	Verify API key in environment variables

Key Takeaways

Use exponential backoff retry logic to handle transient LLM timeouts effectively.
Catch and handle specific API errors like TimeoutError and RateLimitError to maintain app stability.
Validate prompt size and monitor API usage to prevent frequent timeouts.
Implement fallback strategies to ensure graceful degradation during persistent failures.

Verified 2026-04 · gpt-4o

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.