How to use Llama on Together AI
Quick answer
Use the
openai Python SDK with base_url="https://api.together.xyz/v1" and your TOGETHER_API_KEY to call Llama models on Together AI. Specify the model like meta-llama/Llama-3.3-70B-Instruct-Turbo in client.chat.completions.create() with your chat messages.PREREQUISITES
Python 3.8+Together AI API key (set TOGETHER_API_KEY environment variable)pip install openai>=1.0
Setup
Install the openai Python package and set your Together AI API key as an environment variable.
- Install SDK:
pip install openai - Set environment variable:
export TOGETHER_API_KEY="your_api_key_here"(Linux/macOS) orset TOGETHER_API_KEY=your_api_key_here(Windows)
pip install openai Step by step
Use the OpenAI-compatible SDK with Together AI's base URL and specify the Llama model. Send chat messages and print the assistant's reply.
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["TOGETHER_API_KEY"],
base_url="https://api.together.xyz/v1"
)
response = client.chat.completions.create(
model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
messages=[{"role": "user", "content": "Hello, how do I use Llama on Together AI?"}]
)
print(response.choices[0].message.content) output
Hello! To use Llama on Together AI, you call the chat completions endpoint with the Llama model name and your messages.
Common variations
You can use async calls, enable streaming for token-by-token output, or switch to smaller Llama models by changing the model parameter.
import asyncio
from openai import OpenAI
async def main():
client = OpenAI(
api_key=os.environ["TOGETHER_API_KEY"],
base_url="https://api.together.xyz/v1"
)
# Async streaming example
stream = await client.chat.completions.acreate(
model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
messages=[{"role": "user", "content": "Stream a response from Llama."}],
stream=True
)
async for chunk in stream:
delta = chunk.choices[0].delta.content or ""
print(delta, end="", flush=True)
asyncio.run(main()) output
Streaming response text printed token by token...
Troubleshooting
- If you get authentication errors, verify your
TOGETHER_API_KEYenvironment variable is set correctly. - If the model is not found, confirm you are using the exact model name
meta-llama/Llama-3.3-70B-Instruct-Turbo. - For network errors, check your internet connection and Together AI service status.
Key Takeaways
- Use the OpenAI Python SDK with Together AI's base_url to access Llama models.
- Specify the full model name like
meta-llama/Llama-3.3-70B-Instruct-Turboin your requests. - Enable streaming or async calls for more interactive usage.
- Always set your API key in the
TOGETHER_API_KEYenvironment variable. - Check model names and environment variables carefully to avoid common errors.