How to intermediate · 3 min read

How to stream OpenAI responses with FastAPI

Q: How to stream OpenAI responses with FastAPI

Use the OpenAI SDK's streaming feature with FastAPI's StreamingResponse to send tokens as they arrive. This enables real-time response streaming from models like gpt-4o to clients efficiently.

Quick answer

Use the OpenAI SDK's streaming feature with FastAPI's StreamingResponse to send tokens as they arrive. This enables real-time response streaming from models like gpt-4o to clients efficiently.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0 fastapi uvicorn

Setup

Install the required packages and set your OpenAI API key as an environment variable.

Install FastAPI and Uvicorn for the web server.
Install the OpenAI Python SDK version 1.0 or higher.
Set OPENAI_API_KEY in your environment.

bash

pip install openai>=1.0 fastapi uvicorn

Step by step

This example shows a complete FastAPI app that streams tokens from the gpt-4o model using the OpenAI SDK's streaming interface. The server sends tokens as a streaming HTTP response.

python

import os
from fastapi import FastAPI, Request
from fastapi.responses import StreamingResponse
from openai import OpenAI

app = FastAPI()
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def stream_openai_response(messages):
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        stream=True
    )
    for chunk in response:
        if chunk.choices[0].delta.content:
            yield chunk.choices[0].delta.content

@app.post("/chat/stream")
async def chat_stream(request: Request):
    data = await request.json()
    messages = data.get("messages", [{"role": "user", "content": "Hello"}])
    return StreamingResponse(stream_openai_response(messages), media_type="text/plain")

Common variations

Async streaming: The example uses async generator for streaming tokens.
Different models: Change model="gpt-4o" to any supported streaming model.
Custom media types: Use media_type="text/event-stream" for SSE clients.

python

from fastapi.responses import StreamingResponse

# For Server-Sent Events (SSE) streaming
return StreamingResponse(stream_openai_response(messages), media_type="text/event-stream")

Troubleshooting

If you get authentication errors, verify OPENAI_API_KEY is set correctly.
For connection timeouts, check your network and API endpoint availability.
If streaming yields no output, confirm the model supports streaming and stream=True is set.

✅

Key Takeaways

Use stream=True in client.chat.completions.create to enable streaming.
FastAPI's StreamingResponse efficiently sends tokens to clients as they arrive.
Always read the API key from os.environ to keep credentials secure.

Verified 2026-04 · gpt-4o

Verify ↗