How to beginner · 3 min read

How to use Mixtral on Together AI

Quick answer
Use the openai Python SDK with the base_url set to Together AI's endpoint and specify model="meta-llama/Llama-3.3-70B-Instruct-Turbo" to access Mixtral. Create chat completions with client.chat.completions.create() passing your messages to get responses.

PREREQUISITES

  • Python 3.8+
  • Together AI API key
  • pip install openai>=1.0

Setup

Install the openai Python package and set your Together AI API key as an environment variable. Use the Together AI OpenAI-compatible API endpoint https://api.together.xyz/v1 when creating the client.

bash
pip install openai

Step by step

Use the OpenAI client from the openai package with the base_url set to Together AI's API. Specify the Mixtral model meta-llama/Llama-3.3-70B-Instruct-Turbo and provide your chat messages. The example below sends a user prompt and prints the assistant's reply.

python
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["TOGETHER_API_KEY"],
    base_url="https://api.together.xyz/v1"
)

response = client.chat.completions.create(
    model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
    messages=[{"role": "user", "content": "Explain retrieval augmented generation (RAG) in simple terms."}]
)

print(response.choices[0].message.content)
output
Retrieval augmented generation (RAG) is a technique where a language model uses external documents or data sources to find relevant information and then generates answers based on that information. This helps improve accuracy and provides up-to-date responses.

Common variations

You can use streaming to receive tokens as they are generated by setting stream=True. For asynchronous usage, use async functions with await. You can also switch to other Together AI models by changing the model parameter.

python
import asyncio
from openai import OpenAI

async def main():
    client = OpenAI(
        api_key=os.environ["TOGETHER_API_KEY"],
        base_url="https://api.together.xyz/v1"
    )

    # Streaming example
    stream = client.chat.completions.create(
        model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
        messages=[{"role": "user", "content": "List benefits of RAG."}],
        stream=True
    )

    async for chunk in stream:
        print(chunk.choices[0].delta.content or "", end="", flush=True)

asyncio.run(main())
output
Improves accuracy by grounding responses in external data, enables up-to-date information retrieval, and reduces hallucinations.

Troubleshooting

  • If you get authentication errors, verify your TOGETHER_API_KEY environment variable is set correctly.
  • If the model is not found, confirm you are using the exact model name meta-llama/Llama-3.3-70B-Instruct-Turbo.
  • For network issues, check your internet connection and that https://api.together.xyz/v1 is reachable.

Key Takeaways

  • Use the OpenAI SDK with base_url set to Together AI's endpoint to access Mixtral.
  • Specify model="meta-llama/Llama-3.3-70B-Instruct-Turbo" for Mixtral on Together AI.
  • Streaming and async calls improve responsiveness for interactive applications.
Verified 2026-04 · meta-llama/Llama-3.3-70B-Instruct-Turbo
Verify ↗