How to beginner · 3 min read

How to stream Together AI responses

Quick answer
Use the openai Python SDK with base_url="https://api.together.xyz/v1" and stream=True in chat.completions.create() to stream Together AI responses. Iterate over the response asynchronously or synchronously to handle tokens as they arrive.

PREREQUISITES

  • Python 3.8+
  • Together AI API key
  • pip install openai>=1.0

Setup

Install the openai Python package (v1+) and set your Together AI API key as an environment variable.

  • Install SDK: pip install openai
  • Set environment variable: export TOGETHER_API_KEY=your_api_key (Linux/macOS) or setx TOGETHER_API_KEY your_api_key (Windows)
bash
pip install openai

Step by step

This example shows how to stream Together AI chat completions synchronously using the OpenAI SDK with the base_url override.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["TOGETHER_API_KEY"], base_url="https://api.together.xyz/v1")

messages = [{"role": "user", "content": "Explain the benefits of streaming AI responses."}]

# Create a streaming chat completion
stream = client.chat.completions.create(
    model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
    messages=messages,
    stream=True
)

# Iterate over streamed chunks and print tokens as they arrive
for chunk in stream:
    delta = chunk.choices[0].delta.content or ""
    print(delta, end="", flush=True)

print()
output
Explain the benefits of streaming AI responses. Streaming allows you to receive tokens in real-time, reducing latency and improving user experience by displaying partial answers immediately.

Common variations

You can also use asynchronous streaming with async for in an async function for integration in async frameworks.

Change the model to other Together AI models by updating the model parameter.

python
import os
import asyncio
from openai import OpenAI

async def async_stream():
    client = OpenAI(api_key=os.environ["TOGETHER_API_KEY"], base_url="https://api.together.xyz/v1")
    messages = [{"role": "user", "content": "Tell me a joke."}]

    stream = await client.chat.completions.create(
        model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
        messages=messages,
        stream=True
    )

    async for chunk in stream:
        delta = chunk.choices[0].delta.content or ""
        print(delta, end="", flush=True)

    print()

if __name__ == "__main__":
    asyncio.run(async_stream())
output
Why did the AI go to school? Because it wanted to improve its neural network!

Troubleshooting

  • If you get authentication errors, verify your TOGETHER_API_KEY environment variable is set correctly.
  • If streaming does not start, ensure you set stream=True and use the correct base_url for Together AI.
  • For connection timeouts, check your network and retry with exponential backoff.

Key Takeaways

  • Use the OpenAI SDK with base_url="https://api.together.xyz/v1" to access Together AI.
  • Set stream=True in chat.completions.create() to receive tokens incrementally.
  • Iterate over the streaming response synchronously or asynchronously to handle real-time output.
  • Always keep your API key secure and set via environment variables.
  • Adjust model names as Together AI updates their offerings.
Verified 2026-04 · meta-llama/Llama-3.3-70B-Instruct-Turbo
Verify ↗