Debug Fix intermediate · 3 min read

How to handle LLM timeouts gracefully

Q: How to handle LLM timeouts gracefully

Handle LLM timeouts gracefully by implementing retry logic with exponential backoff around your API calls to catch TimeoutError or APIConnectionError. Additionally, use timeouts and fallback responses to maintain a smooth user experience without blocking.

Quick answer

Handle LLM timeouts gracefully by implementing retry logic with exponential backoff around your API calls to catch TimeoutError or APIConnectionError. Additionally, use timeouts and fallback responses to maintain a smooth user experience without blocking.

ERROR TYPE api_error

⚡ QUICK FIX

Add exponential backoff retry logic around your API call to handle TimeoutError automatically.

Why this happens

LLM timeouts occur when the API request exceeds the server or client timeout threshold, often due to network latency, heavy load, or large prompt processing. This triggers exceptions like TimeoutError or APIConnectionError. For example, a simple call without retries:

python

from openai import OpenAI
import os

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Explain quantum computing"}]
)
print(response.choices[0].message.content)

output

Traceback (most recent call last):
  File "app.py", line 8, in <module>
    response = client.chat.completions.create(...)
  File "openai/api_resources/chat_completion.py", line 45, in create
    raise TimeoutError("Request timed out")
TimeoutError: Request timed out

The fix

Wrap your LLM API call in a retry loop with exponential backoff to handle transient timeouts. This retries the request after increasing delays, reducing load spikes and improving success rates.

Example with time.sleep and catching TimeoutError:

python

import time
from openai import OpenAI
import os

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

max_retries = 3
backoff_factor = 2

for attempt in range(max_retries):
    try:
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": "Explain quantum computing"}]
        )
        print(response.choices[0].message.content)
        break
    except TimeoutError:
        wait = backoff_factor ** attempt
        print(f"Timeout, retrying in {wait} seconds...")
        time.sleep(wait)
else:
    print("Failed after retries")

output

Timeout, retrying in 1 seconds...
Timeout, retrying in 2 seconds...
Quantum computing is a field of computing focused on developing computer technology based on the principles of quantum theory...

Preventing it in production

Use client-side timeouts and retries with exponential backoff to handle transient network issues.
Validate prompt size and complexity to avoid excessive processing time.
Implement fallback responses or cached answers to maintain user experience during outages.
Monitor API latency and error rates to proactively adjust retry policies.

Related errors

Error	Cause	Quick fix
RateLimitError	Too many requests in short time	Add exponential backoff retry logic
APIConnectionError	Network or server connection issues	Retry with backoff and check network
InvalidRequestError	Malformed request or invalid parameters	Validate request payload before sending

✅

Key Takeaways

Implement exponential backoff retries to handle transient LLM timeouts effectively.
Validate input size and complexity to reduce processing delays causing timeouts.
Use fallback responses or caching to maintain smooth user experience during failures.

Verified 2026-04 · gpt-4o-mini

Verify ↗