How to stream Together AI responses
Quick answer
Use the
openai Python SDK with base_url="https://api.together.xyz/v1" and stream=True in chat.completions.create() to stream Together AI responses. Iterate over the response asynchronously or synchronously to handle tokens as they arrive.PREREQUISITES
Python 3.8+Together AI API keypip install openai>=1.0
Setup
Install the openai Python package (v1+) and set your Together AI API key as an environment variable.
- Install SDK:
pip install openai - Set environment variable:
export TOGETHER_API_KEY=your_api_key(Linux/macOS) orsetx TOGETHER_API_KEY your_api_key(Windows)
pip install openai Step by step
This example shows how to stream Together AI chat completions synchronously using the OpenAI SDK with the base_url override.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["TOGETHER_API_KEY"], base_url="https://api.together.xyz/v1")
messages = [{"role": "user", "content": "Explain the benefits of streaming AI responses."}]
# Create a streaming chat completion
stream = client.chat.completions.create(
model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
messages=messages,
stream=True
)
# Iterate over streamed chunks and print tokens as they arrive
for chunk in stream:
delta = chunk.choices[0].delta.content or ""
print(delta, end="", flush=True)
print() output
Explain the benefits of streaming AI responses. Streaming allows you to receive tokens in real-time, reducing latency and improving user experience by displaying partial answers immediately.
Common variations
You can also use asynchronous streaming with async for in an async function for integration in async frameworks.
Change the model to other Together AI models by updating the model parameter.
import os
import asyncio
from openai import OpenAI
async def async_stream():
client = OpenAI(api_key=os.environ["TOGETHER_API_KEY"], base_url="https://api.together.xyz/v1")
messages = [{"role": "user", "content": "Tell me a joke."}]
stream = await client.chat.completions.create(
model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
messages=messages,
stream=True
)
async for chunk in stream:
delta = chunk.choices[0].delta.content or ""
print(delta, end="", flush=True)
print()
if __name__ == "__main__":
asyncio.run(async_stream()) output
Why did the AI go to school? Because it wanted to improve its neural network!
Troubleshooting
- If you get authentication errors, verify your
TOGETHER_API_KEYenvironment variable is set correctly. - If streaming does not start, ensure you set
stream=Trueand use the correctbase_urlfor Together AI. - For connection timeouts, check your network and retry with exponential backoff.
Key Takeaways
- Use the OpenAI SDK with
base_url="https://api.together.xyz/v1"to access Together AI. - Set
stream=Trueinchat.completions.create()to receive tokens incrementally. - Iterate over the streaming response synchronously or asynchronously to handle real-time output.
- Always keep your API key secure and set via environment variables.
- Adjust model names as Together AI updates their offerings.