How to beginner · 3 min read

Responses API streaming events explained

Q: Responses API streaming events explained

The Responses API streaming events deliver partial model outputs incrementally as chunks in a stream, enabling real-time token-by-token processing. Use the stream=True parameter with client.chat.completions.create() to receive these events, which contain delta objects representing new tokens or metadata.

Quick answer

The Responses API streaming events deliver partial model outputs incrementally as chunks in a stream, enabling real-time token-by-token processing. Use the stream=True parameter with client.chat.completions.create() to receive these events, which contain delta objects representing new tokens or metadata.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the official OpenAI Python SDK version 1.0 or higher and set your API key as an environment variable for secure authentication.

bash

pip install openai>=1.0

Step by step

This example demonstrates how to stream responses from the OpenAI chat completions endpoint using the stream=True parameter. The client yields chunks containing incremental tokens in the delta.content field, which you can process in real time.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

messages = [{"role": "user", "content": "Explain streaming events in the OpenAI Responses API."}]

stream = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
    stream=True
)

print("Streaming response:")
for chunk in stream:
    token = chunk.choices[0].delta.get("content", "")
    print(token, end="", flush=True)
print()

output

Streaming response:
The OpenAI Responses API streaming events deliver tokens incrementally, allowing real-time processing of model output.

Common variations

Async streaming: Use async for with an async client to handle streaming asynchronously.
Different models: Replace model="gpt-4o-mini" with any supported streaming-capable model like gpt-4o-mini or claude-3-5-sonnet-20241022.
Handling finish reasons: Check chunk.choices[0].finish_reason to detect end of stream or tool calls.

python

import asyncio
import os
from openai import OpenAI

async def async_stream():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    messages = [{"role": "user", "content": "Explain streaming events asynchronously."}]
    stream = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        stream=True
    )
    async for chunk in stream:
        token = chunk.choices[0].delta.get("content", "")
        print(token, end="", flush=True)
    print()

if __name__ == "__main__":
    asyncio.run(async_stream())

output

Explain streaming events asynchronously.

Troubleshooting

If streaming yields no tokens, verify your stream=True parameter is set and your API key is valid.
For incomplete streams, check network stability and handle finish_reason properly.
Ensure you access tokens via chunk.choices[0].delta.content and not chunk.choices[0].message.content in streaming mode.

✅

Key Takeaways

Use stream=True in client.chat.completions.create() to receive incremental tokens as streaming events.
Each streaming chunk contains a delta object with partial content for real-time processing.
Async streaming enables non-blocking token consumption with async for loops.
Check finish_reason in chunks to detect stream completion or tool calls.
Always handle streaming tokens by accessing chunk.choices[0].delta.content safely.

Verified 2026-04 · gpt-4o-mini, claude-3-5-sonnet-20241022

Verify ↗