How to beginner · 3 min read

How to stream OpenAI to browser

Q: How to stream OpenAI to browser

Use the OpenAI Python SDK's chat.completions.create method with stream=True to receive streamed responses. Then, implement a FastAPI endpoint that yields these chunks as Server-Sent Events (SSE) to the browser for real-time display.

Quick answer

Use the OpenAI Python SDK's chat.completions.create method with stream=True to receive streamed responses. Then, implement a FastAPI endpoint that yields these chunks as Server-Sent Events (SSE) to the browser for real-time display.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai fastapi uvicorn

Setup

Install the required Python packages openai, fastapi, and uvicorn for the API server and streaming support.

Set your OpenAI API key as an environment variable OPENAI_API_KEY before running the code.

bash

pip install openai fastapi uvicorn

output

Collecting openai
Collecting fastapi
Collecting uvicorn
Successfully installed openai fastapi uvicorn

Step by step

This example creates a FastAPI server with an endpoint /stream that streams OpenAI chat completions to the browser using Server-Sent Events (SSE).

The server calls client.chat.completions.create with stream=True and yields each chunk's content as SSE data.

python

import os
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from openai import OpenAI

app = FastAPI()
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def stream_openai():
    messages = [{"role": "user", "content": "Explain quantum computing in simple terms."}]
    stream = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        stream=True
    )
    for chunk in stream:
        delta = chunk.choices[0].delta.content or ""
        yield f"data: {delta}\n\n"

@app.get("/stream")
async def stream():
    return StreamingResponse(stream_openai(), media_type="text/event-stream")

# To run:
# uvicorn filename:app --reload --port 8000

output

INFO:     Started server process [12345]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)

# When visiting http://127.0.0.1:8000/stream in browser, streamed tokens appear in real-time.

Common variations

Async streaming: Use async for with the OpenAI client if supported for non-blocking streaming.
Different models: Replace model="gpt-4o-mini" with other OpenAI models like gpt-4o or gpt-4.1.
JavaScript client: Use EventSource in the browser to consume SSE from the FastAPI endpoint.

javascript

/* JavaScript example to consume SSE from /stream endpoint */
const evtSource = new EventSource("http://localhost:8000/stream");
evtSource.onmessage = function(event) {
  const content = event.data;
  console.log("Received chunk:", content);
  // Append content to page element
  document.getElementById("output").textContent += content;
};

output

Received chunk: Quantum computing is a type of computing that uses quantum bits...
Received chunk: Unlike classical bits, quantum bits can be in multiple states...
...

Troubleshooting

If streaming hangs or returns no data, verify your API key and network connectivity.
Ensure the client supports streaming and you set stream=True.
For CORS issues in browser, configure FastAPI with appropriate CORS middleware.

✅

Key Takeaways

Use stream=True in client.chat.completions.create to enable streaming from OpenAI.
Implement a FastAPI endpoint that yields streamed chunks as Server-Sent Events for browser consumption.
Use JavaScript EventSource to receive and display streamed tokens in real time.
Set your OpenAI API key securely via environment variables to avoid credential leaks.

Verified 2026-04 · gpt-4o-mini, gpt-4o, gpt-4.1

Verify ↗