High severity HTTP 504 intermediate · Fix: 5-15 min

TimeoutError

asyncio.exceptions.TimeoutError

What this error means

FastAPI request times out while waiting for a streaming response from an LLM, causing the client connection to close prematurely.

Stack trace

traceback

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.9/site-packages/starlette/middleware/base.py", line 41, in __call__
    await self.dispatch_func(request, call_next)
  File "/usr/local/lib/python3.9/site-packages/fastapi/applications.py", line 276, in call_next
    response = await call_next(request)
  File "/usr/local/lib/python3.9/site-packages/fastapi/routing.py", line 237, in app
    await response(scope, receive, send)
  File "/usr/local/lib/python3.9/site-packages/starlette/responses.py", line 338, in __call__
    async for chunk in self.body_iterator:
  File "/usr/local/lib/python3.9/asyncio/tasks.py", line 481, in wait_for
    raise exceptions.TimeoutError()
asyncio.exceptions.TimeoutError

QUICK FIX

Increase Uvicorn's --timeout-keep-alive and --timeout-graceful-shutdown settings and ensure your streaming generator yields data frequently.

Why it happens

FastAPI uses an internal timeout for request handling and response streaming. When streaming large or slow LLM responses, the default timeout can be exceeded, causing asyncio TimeoutError and terminating the connection. This often happens if the LLM response generator yields data too slowly or the server's timeout settings are too low.

Detection

Monitor FastAPI logs for asyncio TimeoutError exceptions during streaming endpoints and track client disconnects or 504 Gateway Timeout HTTP responses.

Causes & fixes

Default FastAPI/Starlette timeout is too short for slow or large LLM streaming responses

✓ Fix

Increase the server timeout settings in Uvicorn or ASGI middleware to allow longer streaming durations.

LLM streaming generator yields data too slowly or stalls

✓ Fix

Optimize the LLM streaming code to yield chunks more frequently or implement keep-alive chunks to prevent timeouts.

Client or proxy closes connection due to perceived inactivity

✓ Fix

Send periodic heartbeat or whitespace chunks in the streaming response to keep the connection alive.

Blocking synchronous code in async streaming handler delays yielding data

✓ Fix

Refactor blocking calls to async equivalents or run blocking code in a thread pool executor to avoid blocking the event loop.

Code: broken vs fixed

Broken - triggers the error

python

from fastapi import FastAPI
from fastapi.responses import StreamingResponse

app = FastAPI()

async def llm_stream():
    # Simulate slow streaming response
    import asyncio
    for i in range(5):
        await asyncio.sleep(3)  # slow chunk
        yield f"chunk {i}\n"

@app.get("/stream")
async def stream():
    return StreamingResponse(llm_stream(), media_type="text/plain")  # This triggers TimeoutError on slow streams

Fixed - works correctly

python

import os
from fastapi import FastAPI
from fastapi.responses import StreamingResponse

app = FastAPI()

async def llm_stream():
    import asyncio
    for i in range(5):
        await asyncio.sleep(1)  # faster chunk to avoid timeout
        yield f"chunk {i}\n"

@app.get("/stream")
async def stream():
    return StreamingResponse(llm_stream(), media_type="text/plain")  # Fixed by faster yields

# Run with increased timeout settings:
# uvicorn main:app --timeout-keep-alive 30 --timeout-graceful-shutdown 30

print("Streaming endpoint ready with adjusted timeouts")

Reduced sleep duration in the streaming generator to yield data more frequently and advised increasing Uvicorn server timeout settings to prevent FastAPI request timeout errors.

⚠

Workaround

Wrap the streaming generator with a heartbeat coroutine that yields whitespace or newline characters every few seconds to keep the connection alive until the LLM produces real data.

✓

Prevention

Architect your FastAPI LLM streaming endpoints to yield data frequently and configure your ASGI server (e.g., Uvicorn) with higher timeout values to handle slow or large streaming responses reliably.

Python 3.9+ · fastapi >=0.70.0 · tested on 0.95.0

Verified 2026-04

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.