TimeoutError
asyncio.exceptions.TimeoutError
Stack trace
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 162, in __call__
await self.app(scope, receive, send)
File "/usr/local/lib/python3.9/site-packages/starlette/middleware/base.py", line 41, in __call__
await self.dispatch_func(request, call_next)
File "/usr/local/lib/python3.9/site-packages/fastapi/applications.py", line 276, in call_next
response = await call_next(request)
File "/usr/local/lib/python3.9/site-packages/fastapi/routing.py", line 237, in app
await response(scope, receive, send)
File "/usr/local/lib/python3.9/site-packages/starlette/responses.py", line 338, in __call__
async for chunk in self.body_iterator:
File "/usr/local/lib/python3.9/asyncio/tasks.py", line 481, in wait_for
raise exceptions.TimeoutError()
asyncio.exceptions.TimeoutError Why it happens
FastAPI uses an internal timeout for request handling and response streaming. When streaming large or slow LLM responses, the default timeout can be exceeded, causing asyncio TimeoutError and terminating the connection. This often happens if the LLM response generator yields data too slowly or the server's timeout settings are too low.
Detection
Monitor FastAPI logs for asyncio TimeoutError exceptions during streaming endpoints and track client disconnects or 504 Gateway Timeout HTTP responses.
Causes & fixes
Default FastAPI/Starlette timeout is too short for slow or large LLM streaming responses
Increase the server timeout settings in Uvicorn or ASGI middleware to allow longer streaming durations.
LLM streaming generator yields data too slowly or stalls
Optimize the LLM streaming code to yield chunks more frequently or implement keep-alive chunks to prevent timeouts.
Client or proxy closes connection due to perceived inactivity
Send periodic heartbeat or whitespace chunks in the streaming response to keep the connection alive.
Blocking synchronous code in async streaming handler delays yielding data
Refactor blocking calls to async equivalents or run blocking code in a thread pool executor to avoid blocking the event loop.
Code: broken vs fixed
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
app = FastAPI()
async def llm_stream():
# Simulate slow streaming response
import asyncio
for i in range(5):
await asyncio.sleep(3) # slow chunk
yield f"chunk {i}\n"
@app.get("/stream")
async def stream():
return StreamingResponse(llm_stream(), media_type="text/plain") # This triggers TimeoutError on slow streams import os
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
app = FastAPI()
async def llm_stream():
import asyncio
for i in range(5):
await asyncio.sleep(1) # faster chunk to avoid timeout
yield f"chunk {i}\n"
@app.get("/stream")
async def stream():
return StreamingResponse(llm_stream(), media_type="text/plain") # Fixed by faster yields
# Run with increased timeout settings:
# uvicorn main:app --timeout-keep-alive 30 --timeout-graceful-shutdown 30
print("Streaming endpoint ready with adjusted timeouts") Workaround
Wrap the streaming generator with a heartbeat coroutine that yields whitespace or newline characters every few seconds to keep the connection alive until the LLM produces real data.
Prevention
Architect your FastAPI LLM streaming endpoints to yield data frequently and configure your ASGI server (e.g., Uvicorn) with higher timeout values to handle slow or large streaming responses reliably.