How to Intermediate · 4 min read

How to stream OpenAI responses to a web app

Q: How to stream OpenAI responses to a web app

Use the OpenAI Python SDK v1 with the stream=True parameter in client.chat.completions.create to receive partial responses as they generate. Stream these chunks to your web app via server-sent events (SSE) or WebSockets for real-time display.

Quick answer

Use the OpenAI Python SDK v1 with the stream=True parameter in client.chat.completions.create to receive partial responses as they generate. Stream these chunks to your web app via server-sent events (SSE) or WebSockets for real-time display.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0
Basic knowledge of Python web frameworks (e.g., Flask or FastAPI)

Setup

Install the official OpenAI Python SDK version 1 or higher and set your API key as an environment variable.

Run:

pip install openai

Set your API key in your shell:

export OPENAI_API_KEY='your_api_key_here'

bash

pip install openai

Step by step

This example uses Flask to create a simple web server that streams OpenAI chat completions to the client using Server-Sent Events (SSE). The stream=True parameter enables streaming partial responses from the API.

python

import os
from flask import Flask, Response, render_template_string
from openai import OpenAI

app = Flask(__name__)
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

HTML = '''
<!doctype html>
<html>
  <head><title>OpenAI Streaming Demo</title></head>
  <body>
    <h1>OpenAI Streaming Response</h1>
    <pre id="output"></pre>
    <script>
      const evtSource = new EventSource("/stream");
      const output = document.getElementById("output");
      evtSource.onmessage = function(event) {
        if(event.data === "[DONE]") {
          evtSource.close();
          return;
        }
        output.textContent += event.data;
      };
    </script>
  </body>
</html>
'''

@app.route("/")
def index():
    return render_template_string(HTML)

@app.route("/stream")
def stream():
    def generate():
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": "Write a short poem about streaming AI responses."}],
            stream=True
        )
        for chunk in response:
            # Each chunk contains partial message content
            delta = chunk.choices[0].delta
            if "content" in delta:
                yield f"data: {delta['content']}\n\n"
        yield "data: [DONE]\n\n"

    return Response(generate(), mimetype="text/event-stream")

if __name__ == "__main__":
    app.run(debug=True, threaded=True)

output

 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)

# When visiting the page, the poem streams line by line in the browser.

Common variations

Async streaming: Use an async web framework like FastAPI with async for to handle streaming asynchronously.
Different models: Replace model="gpt-4o" with any supported streaming model like gpt-4o-mini.
WebSocket streaming: Use WebSockets instead of SSE for bidirectional communication.

python

import os
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from openai import OpenAI

app = FastAPI()
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

@app.get("/stream")
async def stream():
    async def event_generator():
        response = await client.chat.completions.acreate(
            model="gpt-4o",
            messages=[{"role": "user", "content": "Stream a joke."}],
            stream=True
        )
        async for chunk in response:
            delta = chunk.choices[0].delta
            if "content" in delta:
                yield f"data: {delta['content']}\n\n"
        yield "data: [DONE]\n\n"

    return StreamingResponse(event_generator(), media_type="text/event-stream")

Troubleshooting

If streaming hangs or returns no data, verify your API key and network connectivity.
Ensure your client supports Server-Sent Events or WebSockets.
Check for rate limits or quota exhaustion in your OpenAI account.

✅

Key Takeaways

Use stream=True in client.chat.completions.create to enable streaming responses.
Stream partial chunks to the frontend using Server-Sent Events or WebSockets for real-time updates.
Use the official OpenAI Python SDK v1 with environment-based API keys for secure integration.

Verified 2026-04 · gpt-4o, gpt-4o-mini

Verify ↗