How to beginner · 3 min read

How to use Mixtral on Together AI

Q: How to use Mixtral on Together AI

Use the openai Python SDK with the base_url set to Together AI's endpoint and specify model="meta-llama/Llama-3.3-70B-Instruct-Turbo" to access Mixtral. Create chat completions with client.chat.completions.create() passing your messages to get responses.

Quick answer

Use the openai Python SDK with the base_url set to Together AI's endpoint and specify model="meta-llama/Llama-3.3-70B-Instruct-Turbo" to access Mixtral. Create chat completions with client.chat.completions.create() passing your messages to get responses.

PREREQUISITES

Python 3.8+
Together AI API key
pip install openai>=1.0

Setup

Install the openai Python package and set your Together AI API key as an environment variable. Use the Together AI OpenAI-compatible API endpoint https://api.together.xyz/v1 when creating the client.

bash

pip install openai

Step by step

Use the OpenAI client from the openai package with the base_url set to Together AI's API. Specify the Mixtral model meta-llama/Llama-3.3-70B-Instruct-Turbo and provide your chat messages. The example below sends a user prompt and prints the assistant's reply.

python

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["TOGETHER_API_KEY"],
    base_url="https://api.together.xyz/v1"
)

response = client.chat.completions.create(
    model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
    messages=[{"role": "user", "content": "Explain retrieval augmented generation (RAG) in simple terms."}]
)

print(response.choices[0].message.content)

output

Retrieval augmented generation (RAG) is a technique where a language model uses external documents or data sources to find relevant information and then generates answers based on that information. This helps improve accuracy and provides up-to-date responses.

Common variations

You can use streaming to receive tokens as they are generated by setting stream=True. For asynchronous usage, use async functions with await. You can also switch to other Together AI models by changing the model parameter.

python

import asyncio
from openai import OpenAI

async def main():
    client = OpenAI(
        api_key=os.environ["TOGETHER_API_KEY"],
        base_url="https://api.together.xyz/v1"
    )

    # Streaming example
    stream = client.chat.completions.create(
        model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
        messages=[{"role": "user", "content": "List benefits of RAG."}],
        stream=True
    )

    async for chunk in stream:
        print(chunk.choices[0].delta.content or "", end="", flush=True)

asyncio.run(main())

output

Improves accuracy by grounding responses in external data, enables up-to-date information retrieval, and reduces hallucinations.

Troubleshooting

If you get authentication errors, verify your TOGETHER_API_KEY environment variable is set correctly.
If the model is not found, confirm you are using the exact model name meta-llama/Llama-3.3-70B-Instruct-Turbo.
For network issues, check your internet connection and that https://api.together.xyz/v1 is reachable.

Key Takeaways

Use the OpenAI SDK with base_url set to Together AI's endpoint to access Mixtral.
Specify model="meta-llama/Llama-3.3-70B-Instruct-Turbo" for Mixtral on Together AI.
Streaming and async calls improve responsiveness for interactive applications.

Verified 2026-04 · meta-llama/Llama-3.3-70B-Instruct-Turbo

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.