Fix FastAPI LLM endpoint timeout error
Quick answer
FastAPI endpoint timeout errors when calling LLM APIs occur because the default request timeout is too short or the API call blocks the event loop. Use async API calls with proper timeout settings or increase FastAPI's timeout limits to fix this. Also, ensure your LLM client calls are awaited asynchronously to prevent blocking.
ERROR TYPE
config_error ⚡ QUICK FIX
Make your LLM API calls asynchronous and increase FastAPI's timeout settings to prevent endpoint timeouts.
Why this happens
FastAPI endpoints calling LLM APIs often face timeout errors because the default server timeout (e.g., 60 seconds) is exceeded when the LLM response is slow or the API call blocks the event loop. For example, using synchronous calls to an async LLM client or not awaiting the call properly causes FastAPI to block, triggering a timeout error like TimeoutError or 504 Gateway Timeout.
Typical broken code example:
from fastapi import FastAPI
from openai import OpenAI
import os
app = FastAPI()
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
@app.get("/generate")
def generate():
# Synchronous call blocks event loop
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}]
)
return {"text": response.choices[0].message.content} output
TimeoutError: Request timed out after 60 seconds
The fix
Make the FastAPI endpoint asynchronous and await the LLM API call properly. This prevents blocking the event loop and allows FastAPI to handle other requests while waiting. Additionally, configure the server or client timeout if needed.
Corrected code example:
from fastapi import FastAPI
from openai import OpenAI
import os
app = FastAPI()
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
@app.get("/generate")
async def generate():
response = await client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}]
)
return {"text": response.choices[0].message.content} output
{"text": "Hello"} Preventing it in production
Use these best practices to avoid timeout errors in production:
- Always use
async defendpoints andawaityour LLM API calls. - Configure FastAPI server timeout settings (e.g.,
--timeout-keep-aliveor reverse proxy timeouts) to accommodate longer LLM response times. - Implement retry logic with exponential backoff for transient API errors.
- Use background tasks or queues for very long-running LLM calls to avoid blocking client requests.
Key Takeaways
- Always use async FastAPI endpoints with awaited LLM API calls to prevent blocking.
- Increase server and proxy timeout settings to handle slow LLM responses.
- Implement retries with exponential backoff for robust API integration.