How to Intermediate · 3 min read

How to use Server-Sent Events with FastAPI for LLM streaming

Q: How to use Server-Sent Events with FastAPI for LLM streaming

Use FastAPI's streaming response with the text/event-stream content type to implement Server-Sent Events (SSE) for LLM streaming. Connect to an LLM API like OpenAI's gpt-4o with streaming enabled, then yield chunks as SSE events to the client for real-time updates.

Quick answer

Use FastAPI's streaming response with the text/event-stream content type to implement Server-Sent Events (SSE) for LLM streaming. Connect to an LLM API like OpenAI's gpt-4o with streaming enabled, then yield chunks as SSE events to the client for real-time updates.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install fastapi uvicorn openai>=1.0

Setup

Install the required packages and set your OpenAI API key as an environment variable.

Install FastAPI, Uvicorn, and OpenAI SDK:

bash

pip install fastapi uvicorn openai>=1.0

Step by step

This example shows a complete FastAPI app that streams responses from OpenAI's gpt-4o model using Server-Sent Events (SSE). The client receives partial completions in real time.

python

import os
from fastapi import FastAPI, Request
from fastapi.responses import StreamingResponse
from openai import OpenAI

app = FastAPI()
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def event_generator(prompt: str):
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
        stream=True
    )
    # Stream chunks as SSE data events
    for chunk in response:
        if 'choices' in chunk and len(chunk.choices) > 0:
            delta = chunk.choices[0].delta
            if delta and 'content' in delta:
                data = delta['content']
                yield f"data: {data}\n\n"

@app.get("/stream")
async def stream(request: Request, prompt: str = "Hello, world!"):
    generator = event_generator(prompt)
    return StreamingResponse(generator, media_type="text/event-stream")

# Run with: uvicorn filename:app --reload

Common variations

Async streaming: The example uses async generator to handle streaming efficiently.
Different models: Change model="gpt-4o" to any supported streaming model like gpt-4.1 or Anthropic's claude-3-5-haiku-20241022 with their respective SDKs.
Client-side: Use JavaScript EventSource to consume SSE and update UI in real time.

Troubleshooting

If the stream does not start, verify your API key is set correctly in os.environ["OPENAI_API_KEY"].
If the client hangs, ensure the response uses text/event-stream and the generator yields data with the data: ...\n\n SSE format.
Check network or CORS issues if the browser client cannot connect to the SSE endpoint.

✅

Key Takeaways

Use FastAPI's StreamingResponse with media_type 'text/event-stream' to implement SSE.
Enable streaming in the OpenAI SDK by setting stream=True to receive partial LLM outputs.
Yield each chunk prefixed with 'data: ' and double newline to comply with SSE format.
Use async generators in FastAPI for efficient non-blocking streaming.
Test SSE endpoints with JavaScript EventSource for real-time client updates.

Verified 2026-04 · gpt-4o, claude-3-5-haiku-20241022

Verify ↗