How to beginner · 3 min read

How to stream Together AI responses

Q: How to stream Together AI responses

Use the openai Python SDK with base_url="https://api.together.xyz/v1" and stream=True in chat.completions.create() to stream Together AI responses. Iterate over the response asynchronously or synchronously to handle tokens as they arrive.

Quick answer

Use the openai Python SDK with base_url="https://api.together.xyz/v1" and stream=True in chat.completions.create() to stream Together AI responses. Iterate over the response asynchronously or synchronously to handle tokens as they arrive.

PREREQUISITES

Python 3.8+
Together AI API key
pip install openai>=1.0

Setup

Install the openai Python package (v1+) and set your Together AI API key as an environment variable.

Install SDK: pip install openai
Set environment variable: export TOGETHER_API_KEY=your_api_key (Linux/macOS) or setx TOGETHER_API_KEY your_api_key (Windows)

bash

pip install openai

Step by step

This example shows how to stream Together AI chat completions synchronously using the OpenAI SDK with the base_url override.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["TOGETHER_API_KEY"], base_url="https://api.together.xyz/v1")

messages = [{"role": "user", "content": "Explain the benefits of streaming AI responses."}]

# Create a streaming chat completion
stream = client.chat.completions.create(
    model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
    messages=messages,
    stream=True
)

# Iterate over streamed chunks and print tokens as they arrive
for chunk in stream:
    delta = chunk.choices[0].delta.content or ""
    print(delta, end="", flush=True)

print()

output

Explain the benefits of streaming AI responses. Streaming allows you to receive tokens in real-time, reducing latency and improving user experience by displaying partial answers immediately.

Common variations

You can also use asynchronous streaming with async for in an async function for integration in async frameworks.

Change the model to other Together AI models by updating the model parameter.

python

import os
import asyncio
from openai import OpenAI

async def async_stream():
    client = OpenAI(api_key=os.environ["TOGETHER_API_KEY"], base_url="https://api.together.xyz/v1")
    messages = [{"role": "user", "content": "Tell me a joke."}]

    stream = await client.chat.completions.create(
        model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
        messages=messages,
        stream=True
    )

    async for chunk in stream:
        delta = chunk.choices[0].delta.content or ""
        print(delta, end="", flush=True)

    print()

if __name__ == "__main__":
    asyncio.run(async_stream())

output

Why did the AI go to school? Because it wanted to improve its neural network!

Troubleshooting

If you get authentication errors, verify your TOGETHER_API_KEY environment variable is set correctly.
If streaming does not start, ensure you set stream=True and use the correct base_url for Together AI.
For connection timeouts, check your network and retry with exponential backoff.

Key Takeaways

Use the OpenAI SDK with base_url="https://api.together.xyz/v1" to access Together AI.
Set stream=True in chat.completions.create() to receive tokens incrementally.
Iterate over the streaming response synchronously or asynchronously to handle real-time output.
Always keep your API key secure and set via environment variables.
Adjust model names as Together AI updates their offerings.

Verified 2026-04 · meta-llama/Llama-3.3-70B-Instruct-Turbo

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.