How to use Llama via Together AI API
Quick answer
Use the
openai Python SDK with base_url="https://api.together.xyz/v1" and your TOGETHER_API_KEY to call Llama models like meta-llama/Llama-3.3-70B-Instruct-Turbo. Create chat completions with client.chat.completions.create() passing your messages to interact with the model.PREREQUISITES
Python 3.8+Together AI API key (set TOGETHER_API_KEY environment variable)pip install openai>=1.0
Setup
Install the openai Python package (version 1.0 or higher) and set your Together AI API key as an environment variable TOGETHER_API_KEY. The Together AI API is OpenAI-compatible but requires specifying the base_url for requests.
pip install openai>=1.0 Step by step
Use the OpenAI client from the openai package with your Together AI API key and base URL. Call the chat.completions.create() method with the Llama model ID and your chat messages. The example below sends a user prompt and prints the model's response.
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["TOGETHER_API_KEY"],
base_url="https://api.together.xyz/v1"
)
response = client.chat.completions.create(
model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
messages=[{"role": "user", "content": "Explain the benefits of using Llama models."}]
)
print(response.choices[0].message.content) output
Llama models provide state-of-the-art natural language understanding and generation capabilities, enabling efficient and accurate AI applications across various domains.
Common variations
- Use different Llama models by changing the
modelparameter, e.g.,meta-llama/Llama-3.3-70B-Instruct-Turboor smaller variants if available. - For streaming responses, use the
stream=Trueparameter and iterate over the response. - Async calls can be implemented using
asynciowith theopenaiSDK's async client if needed.
import os
import asyncio
from openai import OpenAI
async def main():
client = OpenAI(
api_key=os.environ["TOGETHER_API_KEY"],
base_url="https://api.together.xyz/v1"
)
# Streaming example
response = await client.chat.completions.acreate(
model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
messages=[{"role": "user", "content": "Tell me a joke."}],
stream=True
)
async for chunk in response:
print(chunk.choices[0].delta.get("content", ""), end="", flush=True)
asyncio.run(main()) output
Why did the AI go to school? Because it wanted to improve its neural network!
Troubleshooting
- If you get authentication errors, verify your
TOGETHER_API_KEYenvironment variable is set correctly. - For model not found errors, confirm the model ID is correct and available in Together AI.
- Timeouts or slow responses may require adjusting your network or retrying later.
Key Takeaways
- Use the OpenAI SDK with base_url set to Together AI's endpoint for Llama models.
- Set your Together AI API key in the environment variable TOGETHER_API_KEY.
- Call chat completions with model ID meta-llama/Llama-3.3-70B-Instruct-Turbo for best results.
- Streaming and async calls are supported via the OpenAI SDK patterns.
- Check environment variables and model IDs if you encounter errors.