How to Intermediate · 3 min read

How to stream AI responses in web app

Quick answer
Use the stream parameter in the OpenAI API chat completion call to receive partial AI responses incrementally. This allows your web app to display tokens as they arrive, creating a real-time streaming experience.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0
  • Basic knowledge of asynchronous programming or event-driven frameworks

Setup

Install the OpenAI Python SDK and set your API key as an environment variable to securely authenticate your requests.

bash
pip install openai>=1.0

Step by step

This example demonstrates how to stream AI responses token-by-token using the OpenAI Python SDK in a simple script. The stream=True parameter enables incremental output.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

def stream_chat():
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Explain streaming AI responses."}],
        stream=True
    )

    for chunk in response:
        # Each chunk contains partial tokens
        delta = chunk.choices[0].delta
        if "content" in delta:
            print(delta["content"], end="", flush=True)

if __name__ == "__main__":
    stream_chat()
output
Explain streaming AI responses.

Common variations

  • Async streaming: Use async iterators with frameworks like FastAPI or asyncio for non-blocking UI updates.
  • Different models: Streaming works similarly with gpt-4o, gpt-4o-mini, and other OpenAI chat models.
  • JavaScript clients: Use fetch with ReadableStream or WebSockets to stream tokens in browser apps.
python
import asyncio
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def async_stream_chat():
    response = await client.chat.completions.acreate(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Explain streaming AI responses asynchronously."}],
        stream=True
    )

    async for chunk in response:
        delta = chunk.choices[0].delta
        if "content" in delta:
            print(delta["content"], end="", flush=True)

if __name__ == "__main__":
    asyncio.run(async_stream_chat())
output
Explain streaming AI responses asynchronously.

Troubleshooting

  • If streaming does not start, verify your API key and network connectivity.
  • Ensure your environment supports asynchronous iteration if using async streaming.
  • Check that the stream=True parameter is set; otherwise, the response will be returned all at once.

Key Takeaways

  • Enable streaming by setting stream=True in your chat completion request.
  • Process incremental tokens from the response iterator to update your UI in real time.
  • Use async streaming for non-blocking web frameworks and better user experience.
Verified 2026-04 · gpt-4o, gpt-4o-mini
Verify ↗