How to stream OpenAI responses to a web app
Quick answer
Use the OpenAI Python SDK v1 with the
stream=True parameter in client.chat.completions.create to receive partial responses as they generate. Stream these chunks to your web app via server-sent events (SSE) or WebSockets for real-time display.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0Basic knowledge of Python web frameworks (e.g., Flask or FastAPI)
Setup
Install the official OpenAI Python SDK version 1 or higher and set your API key as an environment variable.
Run:
pip install openaiSet your API key in your shell:
export OPENAI_API_KEY='your_api_key_here'pip install openai Step by step
This example uses Flask to create a simple web server that streams OpenAI chat completions to the client using Server-Sent Events (SSE). The stream=True parameter enables streaming partial responses from the API.
import os
from flask import Flask, Response, render_template_string
from openai import OpenAI
app = Flask(__name__)
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
HTML = '''
<!doctype html>
<html>
<head><title>OpenAI Streaming Demo</title></head>
<body>
<h1>OpenAI Streaming Response</h1>
<pre id="output"></pre>
<script>
const evtSource = new EventSource("/stream");
const output = document.getElementById("output");
evtSource.onmessage = function(event) {
if(event.data === "[DONE]") {
evtSource.close();
return;
}
output.textContent += event.data;
};
</script>
</body>
</html>
'''
@app.route("/")
def index():
return render_template_string(HTML)
@app.route("/stream")
def stream():
def generate():
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Write a short poem about streaming AI responses."}],
stream=True
)
for chunk in response:
# Each chunk contains partial message content
delta = chunk.choices[0].delta
if "content" in delta:
yield f"data: {delta['content']}\n\n"
yield "data: [DONE]\n\n"
return Response(generate(), mimetype="text/event-stream")
if __name__ == "__main__":
app.run(debug=True, threaded=True) output
* Running on http://127.0.0.1:5000/ (Press CTRL+C to quit) # When visiting the page, the poem streams line by line in the browser.
Common variations
- Async streaming: Use an async web framework like FastAPI with
async forto handle streaming asynchronously. - Different models: Replace
model="gpt-4o"with any supported streaming model likegpt-4o-mini. - WebSocket streaming: Use WebSockets instead of SSE for bidirectional communication.
import os
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from openai import OpenAI
app = FastAPI()
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
@app.get("/stream")
async def stream():
async def event_generator():
response = await client.chat.completions.acreate(
model="gpt-4o",
messages=[{"role": "user", "content": "Stream a joke."}],
stream=True
)
async for chunk in response:
delta = chunk.choices[0].delta
if "content" in delta:
yield f"data: {delta['content']}\n\n"
yield "data: [DONE]\n\n"
return StreamingResponse(event_generator(), media_type="text/event-stream") Troubleshooting
- If streaming hangs or returns no data, verify your API key and network connectivity.
- Ensure your client supports Server-Sent Events or WebSockets.
- Check for rate limits or quota exhaustion in your OpenAI account.
Key Takeaways
- Use
stream=Trueinclient.chat.completions.createto enable streaming responses. - Stream partial chunks to the frontend using Server-Sent Events or WebSockets for real-time updates.
- Use the official OpenAI Python SDK v1 with environment-based API keys for secure integration.