How to beginner · 3 min read

How to use Llama on Together AI

Quick answer
Use the openai Python SDK with base_url="https://api.together.xyz/v1" and your TOGETHER_API_KEY to call Llama models on Together AI. Specify the model like meta-llama/Llama-3.3-70B-Instruct-Turbo in client.chat.completions.create() with your chat messages.

PREREQUISITES

  • Python 3.8+
  • Together AI API key (set TOGETHER_API_KEY environment variable)
  • pip install openai>=1.0

Setup

Install the openai Python package and set your Together AI API key as an environment variable.

  • Install SDK: pip install openai
  • Set environment variable: export TOGETHER_API_KEY="your_api_key_here" (Linux/macOS) or set TOGETHER_API_KEY=your_api_key_here (Windows)
bash
pip install openai

Step by step

Use the OpenAI-compatible SDK with Together AI's base URL and specify the Llama model. Send chat messages and print the assistant's reply.

python
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["TOGETHER_API_KEY"],
    base_url="https://api.together.xyz/v1"
)

response = client.chat.completions.create(
    model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
    messages=[{"role": "user", "content": "Hello, how do I use Llama on Together AI?"}]
)

print(response.choices[0].message.content)
output
Hello! To use Llama on Together AI, you call the chat completions endpoint with the Llama model name and your messages.

Common variations

You can use async calls, enable streaming for token-by-token output, or switch to smaller Llama models by changing the model parameter.

python
import asyncio
from openai import OpenAI

async def main():
    client = OpenAI(
        api_key=os.environ["TOGETHER_API_KEY"],
        base_url="https://api.together.xyz/v1"
    )

    # Async streaming example
    stream = await client.chat.completions.acreate(
        model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
        messages=[{"role": "user", "content": "Stream a response from Llama."}],
        stream=True
    )

    async for chunk in stream:
        delta = chunk.choices[0].delta.content or ""
        print(delta, end="", flush=True)

asyncio.run(main())
output
Streaming response text printed token by token...

Troubleshooting

  • If you get authentication errors, verify your TOGETHER_API_KEY environment variable is set correctly.
  • If the model is not found, confirm you are using the exact model name meta-llama/Llama-3.3-70B-Instruct-Turbo.
  • For network errors, check your internet connection and Together AI service status.

Key Takeaways

  • Use the OpenAI Python SDK with Together AI's base_url to access Llama models.
  • Specify the full model name like meta-llama/Llama-3.3-70B-Instruct-Turbo in your requests.
  • Enable streaming or async calls for more interactive usage.
  • Always set your API key in the TOGETHER_API_KEY environment variable.
  • Check model names and environment variables carefully to avoid common errors.
Verified 2026-04 · meta-llama/Llama-3.3-70B-Instruct-Turbo
Verify ↗