How to intermediate · 3 min read

How to stream OpenAI responses with FastAPI

Quick answer
Use the OpenAI SDK's streaming feature with FastAPI's StreamingResponse to send tokens as they arrive. This enables real-time response streaming from models like gpt-4o to clients efficiently.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0 fastapi uvicorn

Setup

Install the required packages and set your OpenAI API key as an environment variable.

  • Install FastAPI and Uvicorn for the web server.
  • Install the OpenAI Python SDK version 1.0 or higher.
  • Set OPENAI_API_KEY in your environment.
bash
pip install openai>=1.0 fastapi uvicorn

Step by step

This example shows a complete FastAPI app that streams tokens from the gpt-4o model using the OpenAI SDK's streaming interface. The server sends tokens as a streaming HTTP response.

python
import os
from fastapi import FastAPI, Request
from fastapi.responses import StreamingResponse
from openai import OpenAI

app = FastAPI()
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def stream_openai_response(messages):
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        stream=True
    )
    for chunk in response:
        if chunk.choices[0].delta.content:
            yield chunk.choices[0].delta.content

@app.post("/chat/stream")
async def chat_stream(request: Request):
    data = await request.json()
    messages = data.get("messages", [{"role": "user", "content": "Hello"}])
    return StreamingResponse(stream_openai_response(messages), media_type="text/plain")

Common variations

  • Async streaming: The example uses async generator for streaming tokens.
  • Different models: Change model="gpt-4o" to any supported streaming model.
  • Custom media types: Use media_type="text/event-stream" for SSE clients.
python
from fastapi.responses import StreamingResponse

# For Server-Sent Events (SSE) streaming
return StreamingResponse(stream_openai_response(messages), media_type="text/event-stream")

Troubleshooting

  • If you get authentication errors, verify OPENAI_API_KEY is set correctly.
  • For connection timeouts, check your network and API endpoint availability.
  • If streaming yields no output, confirm the model supports streaming and stream=True is set.

Key Takeaways

  • Use stream=True in client.chat.completions.create to enable streaming.
  • FastAPI's StreamingResponse efficiently sends tokens to clients as they arrive.
  • Always read the API key from os.environ to keep credentials secure.
Verified 2026-04 · gpt-4o
Verify ↗