How to Intermediate · 3 min read

How to stream tool call results in OpenAI assistant

Q: How to stream tool call results in OpenAI assistant

Use the stream=True parameter in client.chat.completions.create() to receive incremental choices[].delta messages from the OpenAI assistant. This allows you to stream tool call results in real time as the assistant generates them.

Quick answer

Use the stream=True parameter in client.chat.completions.create() to receive incremental choices[].delta messages from the OpenAI assistant. This allows you to stream tool call results in real time as the assistant generates them.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the latest OpenAI Python SDK and set your API key as an environment variable.

Install SDK: pip install openai --upgrade
Set environment variable in your shell:
export OPENAI_API_KEY='your_api_key_here'

bash

pip install openai --upgrade

Step by step

This example demonstrates streaming tool call results from an OpenAI assistant using the stream=True parameter. The assistant calls a mock tool and streams the output incrementally.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Simulate a tool call by the assistant
messages = [
    {"role": "system", "content": "You are an assistant that calls a tool and streams its results."},
    {"role": "user", "content": "Call the weather tool for New York."}
]

response_stream = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    stream=True
)

print("Streaming response:")
full_response = ""
for chunk in response_stream:
    delta = chunk.choices[0].delta
    if "content" in delta:
        print(delta.content, end="", flush=True)
        full_response += delta.content

print("\n\nFull response received:")
print(full_response)

output

Streaming response:
The weather in New York is sunny with a high of 75°F.

Full response received:
The weather in New York is sunny with a high of 75°F.

Common variations

Async streaming: Use async iteration with async for in an async function.
Different models: Replace gpt-4o with other supported models like gpt-4.1.
Handling tool calls: Integrate actual tool APIs and stream their outputs as the assistant generates them.

python

import asyncio
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def stream_response():
    messages = [
        {"role": "system", "content": "You are an assistant that streams tool call results."},
        {"role": "user", "content": "Call the stock price tool for AAPL."}
    ]

    response_stream = await client.chat.completions.acreate(
        model="gpt-4o",
        messages=messages,
        stream=True
    )

    print("Async streaming response:")
    full_response = ""
    async for chunk in response_stream:
        delta = chunk.choices[0].delta
        if "content" in delta:
            print(delta.content, end="", flush=True)
            full_response += delta.content

    print("\n\nFull async response received:")
    print(full_response)

asyncio.run(stream_response())

output

Async streaming response:
The current stock price of AAPL is $175.32.

Full async response received:
The current stock price of AAPL is $175.32.

Troubleshooting

If streaming hangs or returns no data, verify your network connection and API key validity.
Ensure stream=True is set; otherwise, the response will be returned all at once.
Check for rate limits or quota exceeded errors in your API dashboard.
Use try-except blocks to catch and log exceptions during streaming.

python

try:
    response_stream = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        stream=True
    )
    for chunk in response_stream:
        # process chunks
        pass
except Exception as e:
    print(f"Streaming error: {e}")

✅

Key Takeaways

Use stream=True in client.chat.completions.create() to receive incremental assistant outputs.
Process choices[0].delta.content from each streamed chunk to build the full response.
Async streaming is supported with acreate() and async for iteration.
Always handle exceptions and check API key and network connectivity when streaming.
Replace the system and user messages to customize tool calls and streaming behavior.

Verified 2026-04 · gpt-4o, gpt-4.1

Verify ↗