How to beginner · 3 min read

How to use Llama on Together AI

Q: How to use Llama on Together AI

Use the openai Python SDK with base_url="https://api.together.xyz/v1" and your TOGETHER_API_KEY to call Llama models on Together AI. Specify the model like meta-llama/Llama-3.3-70B-Instruct-Turbo in client.chat.completions.create() with your chat messages.

Quick answer

Use the openai Python SDK with base_url="https://api.together.xyz/v1" and your TOGETHER_API_KEY to call Llama models on Together AI. Specify the model like meta-llama/Llama-3.3-70B-Instruct-Turbo in client.chat.completions.create() with your chat messages.

PREREQUISITES

Python 3.8+
Together AI API key (set TOGETHER_API_KEY environment variable)
pip install openai>=1.0

Setup

Install the openai Python package and set your Together AI API key as an environment variable.

Install SDK: pip install openai
Set environment variable: export TOGETHER_API_KEY="your_api_key_here" (Linux/macOS) or set TOGETHER_API_KEY=your_api_key_here (Windows)

bash

pip install openai

Step by step

Use the OpenAI-compatible SDK with Together AI's base URL and specify the Llama model. Send chat messages and print the assistant's reply.

python

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["TOGETHER_API_KEY"],
    base_url="https://api.together.xyz/v1"
)

response = client.chat.completions.create(
    model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
    messages=[{"role": "user", "content": "Hello, how do I use Llama on Together AI?"}]
)

print(response.choices[0].message.content)

output

Hello! To use Llama on Together AI, you call the chat completions endpoint with the Llama model name and your messages.

Common variations

You can use async calls, enable streaming for token-by-token output, or switch to smaller Llama models by changing the model parameter.

python

import asyncio
from openai import OpenAI

async def main():
    client = OpenAI(
        api_key=os.environ["TOGETHER_API_KEY"],
        base_url="https://api.together.xyz/v1"
    )

    # Async streaming example
    stream = await client.chat.completions.acreate(
        model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
        messages=[{"role": "user", "content": "Stream a response from Llama."}],
        stream=True
    )

    async for chunk in stream:
        delta = chunk.choices[0].delta.content or ""
        print(delta, end="", flush=True)

asyncio.run(main())

output

Streaming response text printed token by token...

Troubleshooting

If you get authentication errors, verify your TOGETHER_API_KEY environment variable is set correctly.
If the model is not found, confirm you are using the exact model name meta-llama/Llama-3.3-70B-Instruct-Turbo.
For network errors, check your internet connection and Together AI service status.

✅

Key Takeaways

Use the OpenAI Python SDK with Together AI's base_url to access Llama models.
Specify the full model name like meta-llama/Llama-3.3-70B-Instruct-Turbo in your requests.
Enable streaming or async calls for more interactive usage.
Always set your API key in the TOGETHER_API_KEY environment variable.
Check model names and environment variables carefully to avoid common errors.

Verified 2026-04 · meta-llama/Llama-3.3-70B-Instruct-Turbo

Verify ↗