How to Intermediate · 3 min read

How to stream tool call results in OpenAI assistant

Quick answer
Use the stream=True parameter in client.chat.completions.create() to receive incremental choices[].delta messages from the OpenAI assistant. This allows you to stream tool call results in real time as the assistant generates them.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup

Install the latest OpenAI Python SDK and set your API key as an environment variable.

  • Install SDK: pip install openai --upgrade
  • Set environment variable in your shell:
    export OPENAI_API_KEY='your_api_key_here'
bash
pip install openai --upgrade

Step by step

This example demonstrates streaming tool call results from an OpenAI assistant using the stream=True parameter. The assistant calls a mock tool and streams the output incrementally.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Simulate a tool call by the assistant
messages = [
    {"role": "system", "content": "You are an assistant that calls a tool and streams its results."},
    {"role": "user", "content": "Call the weather tool for New York."}
]

response_stream = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    stream=True
)

print("Streaming response:")
full_response = ""
for chunk in response_stream:
    delta = chunk.choices[0].delta
    if "content" in delta:
        print(delta.content, end="", flush=True)
        full_response += delta.content

print("\n\nFull response received:")
print(full_response)
output
Streaming response:
The weather in New York is sunny with a high of 75°F.

Full response received:
The weather in New York is sunny with a high of 75°F.

Common variations

  • Async streaming: Use async iteration with async for in an async function.
  • Different models: Replace gpt-4o with other supported models like gpt-4.1.
  • Handling tool calls: Integrate actual tool APIs and stream their outputs as the assistant generates them.
python
import asyncio
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def stream_response():
    messages = [
        {"role": "system", "content": "You are an assistant that streams tool call results."},
        {"role": "user", "content": "Call the stock price tool for AAPL."}
    ]

    response_stream = await client.chat.completions.acreate(
        model="gpt-4o",
        messages=messages,
        stream=True
    )

    print("Async streaming response:")
    full_response = ""
    async for chunk in response_stream:
        delta = chunk.choices[0].delta
        if "content" in delta:
            print(delta.content, end="", flush=True)
            full_response += delta.content

    print("\n\nFull async response received:")
    print(full_response)

asyncio.run(stream_response())
output
Async streaming response:
The current stock price of AAPL is $175.32.

Full async response received:
The current stock price of AAPL is $175.32.

Troubleshooting

  • If streaming hangs or returns no data, verify your network connection and API key validity.
  • Ensure stream=True is set; otherwise, the response will be returned all at once.
  • Check for rate limits or quota exceeded errors in your API dashboard.
  • Use try-except blocks to catch and log exceptions during streaming.
python
try:
    response_stream = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        stream=True
    )
    for chunk in response_stream:
        # process chunks
        pass
except Exception as e:
    print(f"Streaming error: {e}")

Key Takeaways

  • Use stream=True in client.chat.completions.create() to receive incremental assistant outputs.
  • Process choices[0].delta.content from each streamed chunk to build the full response.
  • Async streaming is supported with acreate() and async for iteration.
  • Always handle exceptions and check API key and network connectivity when streaming.
  • Replace the system and user messages to customize tool calls and streaming behavior.
Verified 2026-04 · gpt-4o, gpt-4.1
Verify ↗