How to stream OpenAI to browser
Quick answer
Use the OpenAI Python SDK's
chat.completions.create method with stream=True to receive streamed responses. Then, implement a FastAPI endpoint that yields these chunks as Server-Sent Events (SSE) to the browser for real-time display.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai fastapi uvicorn
Setup
Install the required Python packages openai, fastapi, and uvicorn for the API server and streaming support.
Set your OpenAI API key as an environment variable OPENAI_API_KEY before running the code.
pip install openai fastapi uvicorn output
Collecting openai Collecting fastapi Collecting uvicorn Successfully installed openai fastapi uvicorn
Step by step
This example creates a FastAPI server with an endpoint /stream that streams OpenAI chat completions to the browser using Server-Sent Events (SSE).
The server calls client.chat.completions.create with stream=True and yields each chunk's content as SSE data.
import os
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from openai import OpenAI
app = FastAPI()
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
async def stream_openai():
messages = [{"role": "user", "content": "Explain quantum computing in simple terms."}]
stream = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
stream=True
)
for chunk in stream:
delta = chunk.choices[0].delta.content or ""
yield f"data: {delta}\n\n"
@app.get("/stream")
async def stream():
return StreamingResponse(stream_openai(), media_type="text/event-stream")
# To run:
# uvicorn filename:app --reload --port 8000 output
INFO: Started server process [12345] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit) # When visiting http://127.0.0.1:8000/stream in browser, streamed tokens appear in real-time.
Common variations
- Async streaming: Use
async forwith the OpenAI client if supported for non-blocking streaming. - Different models: Replace
model="gpt-4o-mini"with other OpenAI models likegpt-4oorgpt-4.1. - JavaScript client: Use EventSource in the browser to consume SSE from the FastAPI endpoint.
/* JavaScript example to consume SSE from /stream endpoint */
const evtSource = new EventSource("http://localhost:8000/stream");
evtSource.onmessage = function(event) {
const content = event.data;
console.log("Received chunk:", content);
// Append content to page element
document.getElementById("output").textContent += content;
}; output
Received chunk: Quantum computing is a type of computing that uses quantum bits... Received chunk: Unlike classical bits, quantum bits can be in multiple states... ...
Troubleshooting
- If streaming hangs or returns no data, verify your API key and network connectivity.
- Ensure the client supports streaming and you set
stream=True. - For CORS issues in browser, configure FastAPI with appropriate CORS middleware.
Key Takeaways
- Use
stream=Trueinclient.chat.completions.createto enable streaming from OpenAI. - Implement a FastAPI endpoint that yields streamed chunks as Server-Sent Events for browser consumption.
- Use JavaScript
EventSourceto receive and display streamed tokens in real time. - Set your OpenAI API key securely via environment variables to avoid credential leaks.