Debug Fix intermediate · 3 min read

Fix LLM call timeout in workflow

Q: Fix LLM call timeout in workflow

Timeouts in LLM calls during workflows occur due to network delays or slow model responses. Fix this by adding explicit timeout parameters and implementing retry logic around your client.chat.completions.create() calls to handle transient failures gracefully.

Quick answer

Timeouts in LLM calls during workflows occur due to network delays or slow model responses. Fix this by adding explicit timeout parameters and implementing retry logic around your client.chat.completions.create() calls to handle transient failures gracefully.

ERROR TYPE api_error

⚡ QUICK FIX

Add exponential backoff retry logic around your API call to handle TimeoutError automatically.

Why this happens

Timeouts occur when the LLM API call exceeds the default or configured network timeout, often due to slow model processing or network latency. In workflows, this causes the entire process to fail or hang. Typical error output includes TimeoutError or requests.exceptions.ReadTimeout. Example broken code without timeout handling:

python

from openai import OpenAI
import os

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Explain quantum computing"}]
)
print(response.choices[0].message.content)

output

Traceback (most recent call last):
  File "workflow.py", line 8, in <module>
    response = client.chat.completions.create(...)
  File "openai/api_resources/chat_completion.py", line 50, in create
    raise TimeoutError("Request timed out")
TimeoutError: Request timed out

The fix

Set a timeout parameter in the API call and wrap it with retry logic using tenacity or custom exponential backoff. This ensures transient delays do not break your workflow. The example below retries up to 3 times with increasing wait intervals.

python

from openai import OpenAI
import os
from tenacity import retry, wait_exponential, stop_after_attempt, retry_if_exception_type
import requests

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

@retry(
    wait=wait_exponential(multiplier=1, min=2, max=10),
    stop=stop_after_attempt(3),
    retry=retry_if_exception_type((requests.exceptions.Timeout, TimeoutError))
)
def call_llm():
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Explain quantum computing"}],
        timeout=15  # seconds
    )
    return response.choices[0].message.content

if __name__ == "__main__":
    result = call_llm()
    print(result)

output

Explain quantum computing is a field of study focused on the development of computer technology based on the principles of quantum theory...

Preventing it in production

Use retry libraries like tenacity to automatically retry on timeouts or transient network errors.
Set reasonable timeout values on API calls to avoid indefinite hangs.
Implement circuit breakers or fallback logic to degrade gracefully if the LLM service is unavailable.
Monitor API latency and error rates to proactively adjust retry policies.

Related errors

Error	Cause	Quick fix
TimeoutError	API call took too long	Add `timeout` and retry logic
RateLimitError	Too many requests sent	Implement exponential backoff retries
ConnectionError	Network issues	Retry with backoff and check network connectivity

✅

Key Takeaways

Always set explicit timeout parameters on LLM API calls to avoid indefinite waits.
Use retry mechanisms with exponential backoff to handle transient timeouts and network errors.
Monitor and log API call latencies and failures to tune retry and timeout settings effectively.

Verified 2026-04 · gpt-4o-mini

Verify ↗