Code beginner · 3 min read

How to use Together AI API in Python

Direct answer

Use the openai Python SDK with base_url="https://api.together.xyz/v1" and your TOGETHER_API_KEY to call client.chat.completions.create() with your model and messages.

Setup

Install

bash

pip install openai

Env vars

TOGETHER_API_KEY

Imports

python

from openai import OpenAI
import os

Examples

inHello, how are you?

outI'm doing great, thanks for asking! How can I assist you today?

inWrite a Python function to reverse a string.

outHere's a Python function to reverse a string: ```python def reverse_string(s): return s[::-1] ```

inExplain quantum computing in simple terms.

outQuantum computing uses quantum bits that can be both 0 and 1 at the same time, enabling powerful computations beyond classical computers.

Integration steps

Install the OpenAI Python SDK with pip and set the TOGETHER_API_KEY environment variable.
Import the OpenAI client and initialize it with your API key and Together AI base URL.
Build the messages list with roles and content for the chat completion.
Call client.chat.completions.create() with the Together AI model and messages.
Extract the response text from response.choices[0].message.content.
Use or display the generated text as needed.

Full code

python

from openai import OpenAI
import os

# Initialize Together AI client with API key and base URL
client = OpenAI(api_key=os.environ["TOGETHER_API_KEY"], base_url="https://api.together.xyz/v1")

# Prepare chat messages
messages = [
    {"role": "user", "content": "Hello, how are you?"}
]

# Create chat completion
response = client.chat.completions.create(
    model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
    messages=messages
)

# Extract and print the response text
print("Response:", response.choices[0].message.content)

output

Response: I'm doing great, thanks for asking! How can I assist you today?

API trace

Request

json

{"model": "meta-llama/Llama-3.3-70B-Instruct-Turbo", "messages": [{"role": "user", "content": "Hello, how are you?"}]}

Response

json

{"choices": [{"message": {"content": "I'm doing great, thanks for asking! How can I assist you today?"}}], "usage": {"prompt_tokens": 10, "completion_tokens": 20, "total_tokens": 30}}

Extractresponse.choices[0].message.content

Variants

Streaming chat completion ›

Use streaming to display partial results in real-time for better user experience with long responses.

python

from openai import OpenAI
import os

client = OpenAI(api_key=os.environ["TOGETHER_API_KEY"], base_url="https://api.together.xyz/v1")

messages = [{"role": "user", "content": "Tell me a story."}]

stream = client.chat.completions.create(model="meta-llama/Llama-3.3-70B-Instruct-Turbo", messages=messages, stream=True)

for chunk in stream:
    delta = chunk.choices[0].delta.content or ""
    print(delta, end="", flush=True)
print()

Async chat completion ›

Use async calls when integrating into asynchronous applications or frameworks to improve concurrency.

python

import asyncio
from openai import OpenAI
import os

async def main():
    client = OpenAI(api_key=os.environ["TOGETHER_API_KEY"], base_url="https://api.together.xyz/v1")
    messages = [{"role": "user", "content": "Explain AI."}]
    response = await client.chat.completions.acreate(model="meta-llama/Llama-3.3-70B-Instruct-Turbo", messages=messages)
    print("Async response:", response.choices[0].message.content)

asyncio.run(main())

Use smaller model for faster, cheaper calls ›

Use smaller models for lower latency and cost when high accuracy or detail is not critical.

python

from openai import OpenAI
import os

client = OpenAI(api_key=os.environ["TOGETHER_API_KEY"], base_url="https://api.together.xyz/v1")
messages = [{"role": "user", "content": "Summarize the latest news."}]
response = client.chat.completions.create(model="meta-llama/Llama-3.1-8b-instruct", messages=messages)
print("Summary:", response.choices[0].message.content)

Performance

Latency~1-2 seconds per request for large models like Llama-3.3-70B-Instruct-Turbo

Cost~$0.03 to $0.10 per 1,000 tokens depending on model size

Rate limitsDefault tier: 60 requests per minute, 100,000 tokens per day (check Together AI docs for updates)

Use smaller models for less token consumption and faster responses.
Limit prompt length by summarizing or truncating input.
Cache frequent queries to avoid repeated calls.

Approach	Latency	Cost/call	Best for
Standard call	~1-2s	~$0.05	General purpose chat completions
Streaming	Starts immediately, total ~1-2s	~$0.05	Real-time UI with long outputs
Async call	~1-2s	~$0.05	Concurrent or async apps
Smaller model	~0.5-1s	~$0.01	Faster, cheaper, less detailed responses

✓

Quick tip

Always set the base_url to Together AI's endpoint and use your TOGETHER_API_KEY from environment variables to authenticate.

⚠

Common mistake

Forgetting to set the base_url to Together AI's API endpoint causes authentication errors or default OpenAI calls.

Verified 2026-04 · meta-llama/Llama-3.3-70B-Instruct-Turbo, meta-llama/Llama-3.1-8b-instruct

Verify ↗