How to beginner · 3 min read

How to print streaming response in real time python

Quick answer
Use the OpenAI SDK's chat.completions.create method with stream=True to receive partial responses as they arrive. Iterate over the streamed chunks and print each chunk's content immediately for real-time output.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup

Install the official OpenAI Python SDK and set your API key as an environment variable.

  • Install SDK: pip install openai
  • Set environment variable in your shell: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows)
bash
pip install openai

Step by step

This example demonstrates how to print the streaming response from the OpenAI gpt-4o chat model in real time using Python.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a short poem about AI."}],
    stream=True
)

for chunk in response:
    # Each chunk contains partial message content
    delta = chunk.choices[0].delta
    if "content" in delta:
        print(delta["content"], end="", flush=True)
print()
output
Write a short poem about AI.

AI whispers softly,
In circuits deep and bright,
Dreams of code and logic,
Dancing in the light.

Common variations

You can use streaming with other models like gpt-4o-mini or gpt-4o. For asynchronous streaming, use Python's async for with an async client. Adjust max_tokens or temperature as needed.

python
import asyncio
import os
from openai import OpenAI

async def stream_chat():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    response = await client.chat.completions.acreate(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Tell me a joke."}],
        stream=True
    )
    async for chunk in response:
        delta = chunk.choices[0].delta
        if "content" in delta:
            print(delta["content"], end="", flush=True)
    print()

asyncio.run(stream_chat())
output
Why did the AI go to school? To improve its neural network!

Troubleshooting

If streaming does not print output in real time, ensure stream=True is set and you iterate over the response object. Also, check your terminal supports flushing output immediately. If you get authentication errors, verify your OPENAI_API_KEY environment variable is set correctly.

Key Takeaways

  • Use stream=True in chat.completions.create to enable streaming responses.
  • Iterate over the response object to receive partial message chunks in real time.
  • Print each chunk's content immediately with flush=True for live output.
  • Async streaming is supported with acreate and async for loops.
  • Always set your API key in os.environ and never hardcode it.
Verified 2026-04 · gpt-4o, gpt-4o-mini, gpt-4o
Verify ↗