TimeoutError
asyncio.exceptions.TimeoutError
Stack trace
Traceback (most recent call last):
File "/app/main.py", line 45, in stream_llm_response
async for chunk in llm_stream:
File "/usr/local/lib/python3.10/asyncio/tasks.py", line 481, in wait_for
raise asyncio.exceptions.TimeoutError
asyncio.exceptions.TimeoutError Why it happens
FastAPI's StreamingResponse depends on timely chunks from the LLM streaming generator. If the LLM or network delays cause no data to arrive within the timeout window, asyncio raises a TimeoutError. This often happens when the LLM is slow or the streaming generator stalls.
Detection
Monitor FastAPI logs for asyncio TimeoutError exceptions during streaming endpoints and track response latency metrics to detect slow or stalled LLM streams before client timeouts.
Causes & fixes
LLM streaming generator stalls or delays sending chunks beyond FastAPI's default timeout
Increase FastAPI StreamingResponse timeout by wrapping the generator with asyncio.wait_for with a higher timeout or configure server timeout settings accordingly.
Network latency or slow LLM model response causes delayed streaming chunks
Use a faster or smaller LLM model or optimize network connectivity to reduce streaming delays.
Improper async generator implementation causing blocking or deadlocks
Ensure the LLM streaming generator yields chunks promptly and does not block the event loop.
Code: broken vs fixed
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
app = FastAPI()
async def llm_stream():
# Simulate slow streaming response
import asyncio
await asyncio.sleep(10) # causes timeout
yield b"data chunk"
@app.get("/stream")
async def stream():
return StreamingResponse(llm_stream(), media_type="text/event-stream") # This triggers TimeoutError import os
import asyncio
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
app = FastAPI()
async def llm_stream():
# Simulate slow streaming response
await asyncio.sleep(10) # still slow
yield b"data chunk"
async def wrapped_stream():
# Increase timeout to 30 seconds
try:
async for chunk in asyncio.wait_for(llm_stream(), timeout=30):
yield chunk
except asyncio.TimeoutError:
yield b"event: error\ndata: Timeout occurred\n\n"
@app.get("/stream")
async def stream():
# Changed to wrapped_stream with increased timeout
return StreamingResponse(wrapped_stream(), media_type="text/event-stream")
# Use os.environ for API keys if needed (not shown here as no API keys used) Workaround
Catch asyncio.TimeoutError around the streaming generator and send a fallback message or retry logic to keep the connection alive temporarily.
Prevention
Design LLM streaming generators to yield data frequently and configure FastAPI and server timeouts to accommodate expected LLM response times, or use heartbeat messages to keep streams alive.