How to beginner · 3 min read

How to stream OpenAI to browser

Quick answer
Use the OpenAI Python SDK's chat.completions.create method with stream=True to receive streamed responses. Then, implement a FastAPI endpoint that yields these chunks as Server-Sent Events (SSE) to the browser for real-time display.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai fastapi uvicorn

Setup

Install the required Python packages openai, fastapi, and uvicorn for the API server and streaming support.

Set your OpenAI API key as an environment variable OPENAI_API_KEY before running the code.

bash
pip install openai fastapi uvicorn
output
Collecting openai
Collecting fastapi
Collecting uvicorn
Successfully installed openai fastapi uvicorn

Step by step

This example creates a FastAPI server with an endpoint /stream that streams OpenAI chat completions to the browser using Server-Sent Events (SSE).

The server calls client.chat.completions.create with stream=True and yields each chunk's content as SSE data.

python
import os
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from openai import OpenAI

app = FastAPI()
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def stream_openai():
    messages = [{"role": "user", "content": "Explain quantum computing in simple terms."}]
    stream = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        stream=True
    )
    for chunk in stream:
        delta = chunk.choices[0].delta.content or ""
        yield f"data: {delta}\n\n"

@app.get("/stream")
async def stream():
    return StreamingResponse(stream_openai(), media_type="text/event-stream")

# To run:
# uvicorn filename:app --reload --port 8000
output
INFO:     Started server process [12345]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)

# When visiting http://127.0.0.1:8000/stream in browser, streamed tokens appear in real-time.

Common variations

  • Async streaming: Use async for with the OpenAI client if supported for non-blocking streaming.
  • Different models: Replace model="gpt-4o-mini" with other OpenAI models like gpt-4o or gpt-4.1.
  • JavaScript client: Use EventSource in the browser to consume SSE from the FastAPI endpoint.
javascript
/* JavaScript example to consume SSE from /stream endpoint */
const evtSource = new EventSource("http://localhost:8000/stream");
evtSource.onmessage = function(event) {
  const content = event.data;
  console.log("Received chunk:", content);
  // Append content to page element
  document.getElementById("output").textContent += content;
};
output
Received chunk: Quantum computing is a type of computing that uses quantum bits...
Received chunk: Unlike classical bits, quantum bits can be in multiple states...
...

Troubleshooting

  • If streaming hangs or returns no data, verify your API key and network connectivity.
  • Ensure the client supports streaming and you set stream=True.
  • For CORS issues in browser, configure FastAPI with appropriate CORS middleware.

Key Takeaways

  • Use stream=True in client.chat.completions.create to enable streaming from OpenAI.
  • Implement a FastAPI endpoint that yields streamed chunks as Server-Sent Events for browser consumption.
  • Use JavaScript EventSource to receive and display streamed tokens in real time.
  • Set your OpenAI API key securely via environment variables to avoid credential leaks.
Verified 2026-04 · gpt-4o-mini, gpt-4o, gpt-4.1
Verify ↗