How to beginner · 3 min read

Responses API streaming events explained

Quick answer
The Responses API streaming events deliver partial model outputs incrementally as chunks in a stream, enabling real-time token-by-token processing. Use the stream=True parameter with client.chat.completions.create() to receive these events, which contain delta objects representing new tokens or metadata.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup

Install the official OpenAI Python SDK version 1.0 or higher and set your API key as an environment variable for secure authentication.

bash
pip install openai>=1.0

Step by step

This example demonstrates how to stream responses from the OpenAI chat completions endpoint using the stream=True parameter. The client yields chunks containing incremental tokens in the delta.content field, which you can process in real time.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

messages = [{"role": "user", "content": "Explain streaming events in the OpenAI Responses API."}]

stream = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
    stream=True
)

print("Streaming response:")
for chunk in stream:
    token = chunk.choices[0].delta.get("content", "")
    print(token, end="", flush=True)
print()
output
Streaming response:
The OpenAI Responses API streaming events deliver tokens incrementally, allowing real-time processing of model output.

Common variations

  • Async streaming: Use async for with an async client to handle streaming asynchronously.
  • Different models: Replace model="gpt-4o-mini" with any supported streaming-capable model like gpt-4o-mini or claude-3-5-sonnet-20241022.
  • Handling finish reasons: Check chunk.choices[0].finish_reason to detect end of stream or tool calls.
python
import asyncio
import os
from openai import OpenAI

async def async_stream():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    messages = [{"role": "user", "content": "Explain streaming events asynchronously."}]
    stream = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        stream=True
    )
    async for chunk in stream:
        token = chunk.choices[0].delta.get("content", "")
        print(token, end="", flush=True)
    print()

if __name__ == "__main__":
    asyncio.run(async_stream())
output
Explain streaming events asynchronously.

Troubleshooting

  • If streaming yields no tokens, verify your stream=True parameter is set and your API key is valid.
  • For incomplete streams, check network stability and handle finish_reason properly.
  • Ensure you access tokens via chunk.choices[0].delta.content and not chunk.choices[0].message.content in streaming mode.

Key Takeaways

  • Use stream=True in client.chat.completions.create() to receive incremental tokens as streaming events.
  • Each streaming chunk contains a delta object with partial content for real-time processing.
  • Async streaming enables non-blocking token consumption with async for loops.
  • Check finish_reason in chunks to detect stream completion or tool calls.
  • Always handle streaming tokens by accessing chunk.choices[0].delta.content safely.
Verified 2026-04 · gpt-4o-mini, claude-3-5-sonnet-20241022
Verify ↗