Debug Fix intermediate · 3 min read

Fix LLM call timeout in workflow

Quick answer
Timeouts in LLM calls during workflows occur due to network delays or slow model responses. Fix this by adding explicit timeout parameters and implementing retry logic around your client.chat.completions.create() calls to handle transient failures gracefully.
ERROR TYPE api_error
⚡ QUICK FIX
Add exponential backoff retry logic around your API call to handle TimeoutError automatically.

Why this happens

Timeouts occur when the LLM API call exceeds the default or configured network timeout, often due to slow model processing or network latency. In workflows, this causes the entire process to fail or hang. Typical error output includes TimeoutError or requests.exceptions.ReadTimeout. Example broken code without timeout handling:

python
from openai import OpenAI
import os

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Explain quantum computing"}]
)
print(response.choices[0].message.content)
output
Traceback (most recent call last):
  File "workflow.py", line 8, in <module>
    response = client.chat.completions.create(...)
  File "openai/api_resources/chat_completion.py", line 50, in create
    raise TimeoutError("Request timed out")
TimeoutError: Request timed out

The fix

Set a timeout parameter in the API call and wrap it with retry logic using tenacity or custom exponential backoff. This ensures transient delays do not break your workflow. The example below retries up to 3 times with increasing wait intervals.

python
from openai import OpenAI
import os
from tenacity import retry, wait_exponential, stop_after_attempt, retry_if_exception_type
import requests

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

@retry(
    wait=wait_exponential(multiplier=1, min=2, max=10),
    stop=stop_after_attempt(3),
    retry=retry_if_exception_type((requests.exceptions.Timeout, TimeoutError))
)
def call_llm():
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Explain quantum computing"}],
        timeout=15  # seconds
    )
    return response.choices[0].message.content

if __name__ == "__main__":
    result = call_llm()
    print(result)
output
Explain quantum computing is a field of study focused on the development of computer technology based on the principles of quantum theory...

Preventing it in production

  • Use retry libraries like tenacity to automatically retry on timeouts or transient network errors.
  • Set reasonable timeout values on API calls to avoid indefinite hangs.
  • Implement circuit breakers or fallback logic to degrade gracefully if the LLM service is unavailable.
  • Monitor API latency and error rates to proactively adjust retry policies.

Key Takeaways

  • Always set explicit timeout parameters on LLM API calls to avoid indefinite waits.
  • Use retry mechanisms with exponential backoff to handle transient timeouts and network errors.
  • Monitor and log API call latencies and failures to tune retry and timeout settings effectively.
Verified 2026-04 · gpt-4o-mini
Verify ↗