How to beginner · 3 min read

AI response streaming in product UI

Q: AI response streaming in product UI

Use the stream=True parameter in the client.chat.completions.create method to receive partial AI responses as they generate. Stream these chunks in your product UI to display the AI's output in real time, improving responsiveness and user engagement.

Quick answer

Use the stream=True parameter in the client.chat.completions.create method to receive partial AI responses as they generate. Stream these chunks in your product UI to display the AI's output in real time, improving responsiveness and user engagement.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the official openai Python package and set your API key as an environment variable for secure authentication.

bash

pip install openai>=1.0

output

Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl (xx kB)
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

This example demonstrates streaming AI responses using the OpenAI SDK's chat.completions.create method with stream=True. The code prints tokens as they arrive, simulating real-time UI updates.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

messages = [{"role": "user", "content": "Explain quantum computing in simple terms."}]

stream = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
    stream=True
)

print("AI response streaming:")
for chunk in stream:
    delta = chunk.choices[0].delta.content or ""
    print(delta, end="", flush=True)
print()

output

AI response streaming:
Quantum computing is a type of computing that uses quantum bits, or qubits, which can be in multiple states at once, allowing complex calculations to be done faster than traditional computers.

Common variations

Use asynchronous streaming with async for in async Python environments.
Switch models by changing the model parameter (e.g., gpt-4o-mini for faster, smaller responses).
Integrate streaming with frontend frameworks by sending chunks over WebSockets or Server-Sent Events (SSE) for live UI updates.

python

import asyncio
from openai import OpenAI

async def async_stream():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    messages = [{"role": "user", "content": "Summarize AI response streaming."}]
    stream = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        stream=True
    )
    print("Async AI streaming:")
    async for chunk in stream:
        delta = chunk.choices[0].delta.content or ""
        print(delta, end="", flush=True)
    print()

asyncio.run(async_stream())

output

Async AI streaming:
AI response streaming allows your product UI to display text as it is generated, improving user experience by reducing wait times.

Troubleshooting

If streaming hangs or returns no data, verify your API key and network connectivity.
Ensure your environment supports asynchronous code if using async streaming.
For frontend integration, confirm your WebSocket or SSE implementation correctly handles partial data chunks.

✅

Key Takeaways

Use stream=True in chat.completions.create to receive partial AI outputs in real time.
Streamed tokens can be rendered incrementally in your UI for a responsive user experience.
Async streaming enables non-blocking UI updates in modern Python applications.
Switch models to balance speed and quality depending on your product needs.
Proper error handling and environment setup are essential for reliable streaming.

Verified 2026-04 · gpt-4o-mini

Verify ↗