How to beginner · 3 min read

FastAPI StreamingResponse for LLM

Q: FastAPI StreamingResponse for LLM

Use FastAPI's StreamingResponse to stream tokens from an LLM by calling client.chat.completions.create with stream=True. Iterate over the async generator to yield chunks as Server-Sent Events for real-time streaming in Python.

Quick answer

Use FastAPI's StreamingResponse to stream tokens from an LLM by calling client.chat.completions.create with stream=True. Iterate over the async generator to yield chunks as Server-Sent Events for real-time streaming in Python.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install fastapi uvicorn openai>=1.0

Setup

Install the required packages and set your OpenAI API key as an environment variable.

Install FastAPI, Uvicorn, and OpenAI SDK:

bash

pip install fastapi uvicorn openai>=1.0

output

Collecting fastapi
Collecting uvicorn
Collecting openai
Successfully installed fastapi uvicorn openai

Step by step

This example shows a complete FastAPI app that streams LLM chat completions using the OpenAI SDK's stream=True parameter and returns a StreamingResponse with Server-Sent Events (SSE).

python

import os
import json
from fastapi import FastAPI, Request
from fastapi.responses import StreamingResponse
from openai import OpenAI

app = FastAPI()
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def event_stream(messages):
    # Create streaming chat completion
    stream = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        stream=True
    )
    async for chunk in stream:
        # Extract token delta
        delta = chunk.choices[0].delta.content
        if delta:
            # Format as SSE data
            yield f"data: {json.dumps(delta)}\n\n"

@app.post("/chat/stream")
async def chat_stream(request: Request):
    data = await request.json()
    user_message = data.get("message", "")
    messages = [{"role": "user", "content": user_message}]
    return StreamingResponse(event_stream(messages), media_type="text/event-stream")

# To run:
# uvicorn filename:app --reload

output

INFO:     Started server process [12345]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)

Common variations

Async vs sync: Use async iteration for streaming with FastAPI; sync iteration blocks the event loop.
Different models: Change model="gpt-4o-mini" to any supported OpenAI chat model.
Non-SSE streaming: You can adapt the generator to other streaming protocols if needed.

Troubleshooting

If streaming hangs, verify your API key and network connectivity.
Ensure stream=True is set; otherwise, the response is not streamed.
Check that the client is using the OpenAI SDK v1+ pattern with OpenAI(api_key=...).

✅

Key Takeaways

Use stream=True with client.chat.completions.create to get token streams.
Wrap the async token stream in a FastAPI StreamingResponse with text/event-stream media type for SSE.
Always use async iteration to avoid blocking FastAPI's event loop during streaming.

Verified 2026-04 · gpt-4o-mini

Verify ↗