How to beginner · 3 min read

How to use Llama via Together AI API

Quick answer
Use the openai Python SDK with base_url="https://api.together.xyz/v1" and your TOGETHER_API_KEY to call Llama models like meta-llama/Llama-3.3-70B-Instruct-Turbo. Create chat completions with client.chat.completions.create() passing your messages to interact with the model.

PREREQUISITES

  • Python 3.8+
  • Together AI API key (set TOGETHER_API_KEY environment variable)
  • pip install openai>=1.0

Setup

Install the openai Python package (version 1.0 or higher) and set your Together AI API key as an environment variable TOGETHER_API_KEY. The Together AI API is OpenAI-compatible but requires specifying the base_url for requests.

bash
pip install openai>=1.0

Step by step

Use the OpenAI client from the openai package with your Together AI API key and base URL. Call the chat.completions.create() method with the Llama model ID and your chat messages. The example below sends a user prompt and prints the model's response.

python
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["TOGETHER_API_KEY"],
    base_url="https://api.together.xyz/v1"
)

response = client.chat.completions.create(
    model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
    messages=[{"role": "user", "content": "Explain the benefits of using Llama models."}]
)

print(response.choices[0].message.content)
output
Llama models provide state-of-the-art natural language understanding and generation capabilities, enabling efficient and accurate AI applications across various domains.

Common variations

  • Use different Llama models by changing the model parameter, e.g., meta-llama/Llama-3.3-70B-Instruct-Turbo or smaller variants if available.
  • For streaming responses, use the stream=True parameter and iterate over the response.
  • Async calls can be implemented using asyncio with the openai SDK's async client if needed.
python
import os
import asyncio
from openai import OpenAI

async def main():
    client = OpenAI(
        api_key=os.environ["TOGETHER_API_KEY"],
        base_url="https://api.together.xyz/v1"
    )

    # Streaming example
    response = await client.chat.completions.acreate(
        model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
        messages=[{"role": "user", "content": "Tell me a joke."}],
        stream=True
    )

    async for chunk in response:
        print(chunk.choices[0].delta.get("content", ""), end="", flush=True)

asyncio.run(main())
output
Why did the AI go to school? Because it wanted to improve its neural network!

Troubleshooting

  • If you get authentication errors, verify your TOGETHER_API_KEY environment variable is set correctly.
  • For model not found errors, confirm the model ID is correct and available in Together AI.
  • Timeouts or slow responses may require adjusting your network or retrying later.

Key Takeaways

  • Use the OpenAI SDK with base_url set to Together AI's endpoint for Llama models.
  • Set your Together AI API key in the environment variable TOGETHER_API_KEY.
  • Call chat completions with model ID meta-llama/Llama-3.3-70B-Instruct-Turbo for best results.
  • Streaming and async calls are supported via the OpenAI SDK patterns.
  • Check environment variables and model IDs if you encounter errors.
Verified 2026-04 · meta-llama/Llama-3.3-70B-Instruct-Turbo
Verify ↗