Debug Fix intermediate · 3 min read

How to handle LLM timeouts gracefully

Quick answer
Handle LLM timeouts gracefully by implementing retry logic with exponential backoff around your API calls to catch TimeoutError or APIConnectionError. Additionally, use timeouts and fallback responses to maintain a smooth user experience without blocking.
ERROR TYPE api_error
⚡ QUICK FIX
Add exponential backoff retry logic around your API call to handle TimeoutError automatically.

Why this happens

LLM timeouts occur when the API request exceeds the server or client timeout threshold, often due to network latency, heavy load, or large prompt processing. This triggers exceptions like TimeoutError or APIConnectionError. For example, a simple call without retries:

python
from openai import OpenAI
import os

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Explain quantum computing"}]
)
print(response.choices[0].message.content)
output
Traceback (most recent call last):
  File "app.py", line 8, in <module>
    response = client.chat.completions.create(...)
  File "openai/api_resources/chat_completion.py", line 45, in create
    raise TimeoutError("Request timed out")
TimeoutError: Request timed out

The fix

Wrap your LLM API call in a retry loop with exponential backoff to handle transient timeouts. This retries the request after increasing delays, reducing load spikes and improving success rates.

Example with time.sleep and catching TimeoutError:

python
import time
from openai import OpenAI
import os

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

max_retries = 3
backoff_factor = 2

for attempt in range(max_retries):
    try:
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": "Explain quantum computing"}]
        )
        print(response.choices[0].message.content)
        break
    except TimeoutError:
        wait = backoff_factor ** attempt
        print(f"Timeout, retrying in {wait} seconds...")
        time.sleep(wait)
else:
    print("Failed after retries")
output
Timeout, retrying in 1 seconds...
Timeout, retrying in 2 seconds...
Quantum computing is a field of computing focused on developing computer technology based on the principles of quantum theory...

Preventing it in production

  • Use client-side timeouts and retries with exponential backoff to handle transient network issues.
  • Validate prompt size and complexity to avoid excessive processing time.
  • Implement fallback responses or cached answers to maintain user experience during outages.
  • Monitor API latency and error rates to proactively adjust retry policies.

Key Takeaways

  • Implement exponential backoff retries to handle transient LLM timeouts effectively.
  • Validate input size and complexity to reduce processing delays causing timeouts.
  • Use fallback responses or caching to maintain smooth user experience during failures.
Verified 2026-04 · gpt-4o-mini
Verify ↗