How to use Mixtral on Together AI
Quick answer
Use the
openai Python SDK with the base_url set to Together AI's endpoint and specify model="meta-llama/Llama-3.3-70B-Instruct-Turbo" to access Mixtral. Create chat completions with client.chat.completions.create() passing your messages to get responses.PREREQUISITES
Python 3.8+Together AI API keypip install openai>=1.0
Setup
Install the openai Python package and set your Together AI API key as an environment variable. Use the Together AI OpenAI-compatible API endpoint https://api.together.xyz/v1 when creating the client.
pip install openai Step by step
Use the OpenAI client from the openai package with the base_url set to Together AI's API. Specify the Mixtral model meta-llama/Llama-3.3-70B-Instruct-Turbo and provide your chat messages. The example below sends a user prompt and prints the assistant's reply.
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["TOGETHER_API_KEY"],
base_url="https://api.together.xyz/v1"
)
response = client.chat.completions.create(
model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
messages=[{"role": "user", "content": "Explain retrieval augmented generation (RAG) in simple terms."}]
)
print(response.choices[0].message.content) output
Retrieval augmented generation (RAG) is a technique where a language model uses external documents or data sources to find relevant information and then generates answers based on that information. This helps improve accuracy and provides up-to-date responses.
Common variations
You can use streaming to receive tokens as they are generated by setting stream=True. For asynchronous usage, use async functions with await. You can also switch to other Together AI models by changing the model parameter.
import asyncio
from openai import OpenAI
async def main():
client = OpenAI(
api_key=os.environ["TOGETHER_API_KEY"],
base_url="https://api.together.xyz/v1"
)
# Streaming example
stream = client.chat.completions.create(
model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
messages=[{"role": "user", "content": "List benefits of RAG."}],
stream=True
)
async for chunk in stream:
print(chunk.choices[0].delta.content or "", end="", flush=True)
asyncio.run(main()) output
Improves accuracy by grounding responses in external data, enables up-to-date information retrieval, and reduces hallucinations.
Troubleshooting
- If you get authentication errors, verify your
TOGETHER_API_KEYenvironment variable is set correctly. - If the model is not found, confirm you are using the exact model name
meta-llama/Llama-3.3-70B-Instruct-Turbo. - For network issues, check your internet connection and that
https://api.together.xyz/v1is reachable.
Key Takeaways
- Use the OpenAI SDK with base_url set to Together AI's endpoint to access Mixtral.
- Specify model="meta-llama/Llama-3.3-70B-Instruct-Turbo" for Mixtral on Together AI.
- Streaming and async calls improve responsiveness for interactive applications.