How to use Mixtral on Fireworks AI
Quick answer
Use the openai Python SDK with the base_url set to Fireworks AI's endpoint and specify the model as accounts/fireworks/models/mixtral-8x7b-instruct. Call client.chat.completions.create() with your messages to interact with Mixtral on Fireworks AI.
PREREQUISITES
Python 3.8+Fireworks AI API keypip install openai>=1.0
Setup
Install the openai Python package and set your Fireworks AI API key as an environment variable. Use the Fireworks AI OpenAI-compatible endpoint for API calls.
pip install openai>=1.0 output
Collecting openai Downloading openai-1.x.x-py3-none-any.whl (xx kB) Installing collected packages: openai Successfully installed openai-1.x.x
Step by step
Use the OpenAI client with base_url set to Fireworks AI's API endpoint. Specify the Mixtral model and send chat messages. The example below sends a prompt and prints the response.
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["FIREWORKS_API_KEY"],
base_url="https://api.fireworks.ai/inference/v1"
)
response = client.chat.completions.create(
model="accounts/fireworks/models/mixtral-8x7b-instruct",
messages=[{"role": "user", "content": "Explain retrieval-augmented generation (RAG) in simple terms."}]
)
print("Response:", response.choices[0].message.content) output
Response: Retrieval-augmented generation (RAG) is a technique where an AI model retrieves relevant information from a large database or documents and then uses that information to generate accurate and informed responses.
Common variations
You can enable streaming to receive tokens as they are generated, or use async calls with an async client. You can also switch to other Fireworks AI models by changing the model parameter.
import asyncio
import os
from openai import OpenAI
async def main():
client = OpenAI(
api_key=os.environ["FIREWORKS_API_KEY"],
base_url="https://api.fireworks.ai/inference/v1"
)
# Async streaming example
stream = await client.chat.completions.create(
model="accounts/fireworks/models/mixtral-8x7b-instruct",
messages=[{"role": "user", "content": "Tell me a joke."}],
stream=True
)
async for chunk in stream:
print(chunk.choices[0].delta.content or "", end="", flush=True)
if __name__ == "__main__":
asyncio.run(main()) output
Why did the AI go to school? Because it wanted to improve its neural networks!
Troubleshooting
- If you get authentication errors, verify your
FIREWORKS_API_KEYenvironment variable is set correctly. - For model not found errors, confirm the model name is exactly
accounts/fireworks/models/mixtral-8x7b-instruct. - Network errors may require checking your internet connection or firewall settings.
Key Takeaways
- Use the OpenAI SDK with Fireworks AI's base_url to access Mixtral models.
- Specify the full model name starting with 'accounts/fireworks/models/' for Fireworks AI.
- Streaming and async calls are supported for real-time token generation.
- Always set your API key in the environment variable FIREWORKS_API_KEY.
- Check exact model names and API endpoint URLs to avoid errors.