How to use Together AI API in Python
Direct answer
Use the openai Python SDK with base_url="https://api.together.xyz/v1" and your TOGETHER_API_KEY to call client.chat.completions.create() with your model and messages.
Setup
Install
pip install openai Env vars
TOGETHER_API_KEY Imports
from openai import OpenAI
import os Examples
inHello, how are you?
outI'm doing great, thanks for asking! How can I assist you today?
inWrite a Python function to reverse a string.
outHere's a Python function to reverse a string:
```python
def reverse_string(s):
return s[::-1]
```
inExplain quantum computing in simple terms.
outQuantum computing uses quantum bits that can be both 0 and 1 at the same time, enabling powerful computations beyond classical computers.
Integration steps
- Install the OpenAI Python SDK with pip and set the TOGETHER_API_KEY environment variable.
- Import the OpenAI client and initialize it with your API key and Together AI base URL.
- Build the messages list with roles and content for the chat completion.
- Call client.chat.completions.create() with the Together AI model and messages.
- Extract the response text from response.choices[0].message.content.
- Use or display the generated text as needed.
Full code
from openai import OpenAI
import os
# Initialize Together AI client with API key and base URL
client = OpenAI(api_key=os.environ["TOGETHER_API_KEY"], base_url="https://api.together.xyz/v1")
# Prepare chat messages
messages = [
{"role": "user", "content": "Hello, how are you?"}
]
# Create chat completion
response = client.chat.completions.create(
model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
messages=messages
)
# Extract and print the response text
print("Response:", response.choices[0].message.content) output
Response: I'm doing great, thanks for asking! How can I assist you today?
API trace
Request
{"model": "meta-llama/Llama-3.3-70B-Instruct-Turbo", "messages": [{"role": "user", "content": "Hello, how are you?"}]} Response
{"choices": [{"message": {"content": "I'm doing great, thanks for asking! How can I assist you today?"}}], "usage": {"prompt_tokens": 10, "completion_tokens": 20, "total_tokens": 30}} Extract
response.choices[0].message.contentVariants
Streaming chat completion ›
Use streaming to display partial results in real-time for better user experience with long responses.
from openai import OpenAI
import os
client = OpenAI(api_key=os.environ["TOGETHER_API_KEY"], base_url="https://api.together.xyz/v1")
messages = [{"role": "user", "content": "Tell me a story."}]
stream = client.chat.completions.create(model="meta-llama/Llama-3.3-70B-Instruct-Turbo", messages=messages, stream=True)
for chunk in stream:
delta = chunk.choices[0].delta.content or ""
print(delta, end="", flush=True)
print() Async chat completion ›
Use async calls when integrating into asynchronous applications or frameworks to improve concurrency.
import asyncio
from openai import OpenAI
import os
async def main():
client = OpenAI(api_key=os.environ["TOGETHER_API_KEY"], base_url="https://api.together.xyz/v1")
messages = [{"role": "user", "content": "Explain AI."}]
response = await client.chat.completions.acreate(model="meta-llama/Llama-3.3-70B-Instruct-Turbo", messages=messages)
print("Async response:", response.choices[0].message.content)
asyncio.run(main()) Use smaller model for faster, cheaper calls ›
Use smaller models for lower latency and cost when high accuracy or detail is not critical.
from openai import OpenAI
import os
client = OpenAI(api_key=os.environ["TOGETHER_API_KEY"], base_url="https://api.together.xyz/v1")
messages = [{"role": "user", "content": "Summarize the latest news."}]
response = client.chat.completions.create(model="meta-llama/Llama-3.1-8b-instruct", messages=messages)
print("Summary:", response.choices[0].message.content) Performance
Latency~1-2 seconds per request for large models like Llama-3.3-70B-Instruct-Turbo
Cost~$0.03 to $0.10 per 1,000 tokens depending on model size
Rate limitsDefault tier: 60 requests per minute, 100,000 tokens per day (check Together AI docs for updates)
- Use smaller models for less token consumption and faster responses.
- Limit prompt length by summarizing or truncating input.
- Cache frequent queries to avoid repeated calls.
| Approach | Latency | Cost/call | Best for |
|---|---|---|---|
| Standard call | ~1-2s | ~$0.05 | General purpose chat completions |
| Streaming | Starts immediately, total ~1-2s | ~$0.05 | Real-time UI with long outputs |
| Async call | ~1-2s | ~$0.05 | Concurrent or async apps |
| Smaller model | ~0.5-1s | ~$0.01 | Faster, cheaper, less detailed responses |
Quick tip
Always set the base_url to Together AI's endpoint and use your TOGETHER_API_KEY from environment variables to authenticate.
Common mistake
Forgetting to set the base_url to Together AI's API endpoint causes authentication errors or default OpenAI calls.