How to use Server-Sent Events with FastAPI for LLM streaming
Quick answer
Use FastAPI's streaming response with the
text/event-stream content type to implement Server-Sent Events (SSE) for LLM streaming. Connect to an LLM API like OpenAI's gpt-4o with streaming enabled, then yield chunks as SSE events to the client for real-time updates.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install fastapi uvicorn openai>=1.0
Setup
Install the required packages and set your OpenAI API key as an environment variable.
- Install FastAPI, Uvicorn, and OpenAI SDK:
pip install fastapi uvicorn openai>=1.0 Step by step
This example shows a complete FastAPI app that streams responses from OpenAI's gpt-4o model using Server-Sent Events (SSE). The client receives partial completions in real time.
import os
from fastapi import FastAPI, Request
from fastapi.responses import StreamingResponse
from openai import OpenAI
app = FastAPI()
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
async def event_generator(prompt: str):
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
stream=True
)
# Stream chunks as SSE data events
for chunk in response:
if 'choices' in chunk and len(chunk.choices) > 0:
delta = chunk.choices[0].delta
if delta and 'content' in delta:
data = delta['content']
yield f"data: {data}\n\n"
@app.get("/stream")
async def stream(request: Request, prompt: str = "Hello, world!"):
generator = event_generator(prompt)
return StreamingResponse(generator, media_type="text/event-stream")
# Run with: uvicorn filename:app --reload Common variations
- Async streaming: The example uses async generator to handle streaming efficiently.
- Different models: Change
model="gpt-4o"to any supported streaming model likegpt-4.1or Anthropic'sclaude-3-5-haiku-20241022with their respective SDKs. - Client-side: Use JavaScript
EventSourceto consume SSE and update UI in real time.
Troubleshooting
- If the stream does not start, verify your API key is set correctly in
os.environ["OPENAI_API_KEY"]. - If the client hangs, ensure the response uses
text/event-streamand the generator yields data with thedata: ...\n\nSSE format. - Check network or CORS issues if the browser client cannot connect to the SSE endpoint.
Key Takeaways
- Use FastAPI's StreamingResponse with media_type 'text/event-stream' to implement SSE.
- Enable streaming in the OpenAI SDK by setting stream=True to receive partial LLM outputs.
- Yield each chunk prefixed with 'data: ' and double newline to comply with SSE format.
- Use async generators in FastAPI for efficient non-blocking streaming.
- Test SSE endpoints with JavaScript EventSource for real-time client updates.