IncompleteReadError
http.client.IncompleteRead
Stack trace
http.client.IncompleteRead: IncompleteRead(123 bytes read, 456 more expected)
File "/usr/lib/python3.9/http/client.py", line 639, in read
raise IncompleteRead(self.partial, self.expected)
File "/usr/lib/python3.9/http/client.py", line 676, in _readall_chunked
chunk = self._safe_read(self.chunk_left)
File "/usr/lib/python3.9/http/client.py", line 606, in _safe_read
data = self.fp.read(amt)
File "/usr/lib/python3.9/socket.py", line 589, in readinto
return self._sock.recv_into(b)
File "/usr/lib/python3.9/ssl.py", line 1052, in recv_into
return self.read(nbytes, buffer)
File "/usr/lib/python3.9/ssl.py", line 911, in read
return self._sslobj.read(len, buffer)
Why it happens
FastAPI's StreamingResponse depends on an underlying async generator or iterator to yield chunks of data. If the LLM streaming client or the generator closes early, or if the connection is interrupted, the StreamingResponse will end prematurely, causing incomplete output. This often happens when the LLM client stream is not fully consumed or closed properly, or when the client disconnects unexpectedly.
Detection
Monitor the length and completeness of streamed responses in logs or client-side. Wrap the streaming generator with logging to detect early termination or exceptions during iteration.
Causes & fixes
LLM streaming client disconnects or closes the stream before full output is sent
Ensure the LLM streaming client is fully consumed and awaited until the stream ends before returning the StreamingResponse.
FastAPI StreamingResponse generator function yields incomplete chunks or raises exceptions mid-stream
Add robust error handling and ensure the async generator yields all chunks completely without premature exit.
Client disconnects before the full streamed response is received
Implement client disconnect detection and optionally retry or log incomplete streams for diagnostics.
Network interruptions or timeouts during streaming
Configure appropriate timeouts and retries on the LLM client and FastAPI server to handle transient network issues.
Code: broken vs fixed
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
app = FastAPI()
async def llm_stream():
# This yields partial chunks but may exit early
async for chunk in llm_client.stream():
yield chunk # This line causes incomplete output if stream closes early
@app.get("/stream")
async def stream_endpoint():
return StreamingResponse(llm_stream(), media_type="text/event-stream") import os
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
app = FastAPI()
async def llm_stream():
# Fully consume the LLM stream ensuring no premature exit
async for chunk in llm_client.stream():
yield chunk
# Optionally add finalization or flush here
@app.get("/stream")
async def stream_endpoint():
# Await the generator fully by returning StreamingResponse directly
return StreamingResponse(llm_stream(), media_type="text/event-stream")
# Note: Ensure llm_client is properly initialized with API keys from os.environ Workaround
Wrap the streaming generator in try/except to catch early termination, then buffer partial output and retry the LLM call or return a fallback message.
Prevention
Use robust async streaming patterns with proper error handling and client disconnect detection. Prefer LLM clients with built-in streaming reliability and retries to guarantee full output delivery.