How to beginner · 3 min read

How to use Mixtral on Fireworks AI

Quick answer
Use the openai Python SDK with the base_url set to Fireworks AI's endpoint and specify the model as accounts/fireworks/models/mixtral-8x7b-instruct. Call client.chat.completions.create() with your messages to interact with Mixtral on Fireworks AI.

PREREQUISITES

  • Python 3.8+
  • Fireworks AI API key
  • pip install openai>=1.0

Setup

Install the openai Python package and set your Fireworks AI API key as an environment variable. Use the Fireworks AI OpenAI-compatible endpoint for API calls.

bash
pip install openai>=1.0
output
Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl (xx kB)
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

Use the OpenAI client with base_url set to Fireworks AI's API endpoint. Specify the Mixtral model and send chat messages. The example below sends a prompt and prints the response.

python
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["FIREWORKS_API_KEY"],
    base_url="https://api.fireworks.ai/inference/v1"
)

response = client.chat.completions.create(
    model="accounts/fireworks/models/mixtral-8x7b-instruct",
    messages=[{"role": "user", "content": "Explain retrieval-augmented generation (RAG) in simple terms."}]
)

print("Response:", response.choices[0].message.content)
output
Response: Retrieval-augmented generation (RAG) is a technique where an AI model retrieves relevant information from a large database or documents and then uses that information to generate accurate and informed responses.

Common variations

You can enable streaming to receive tokens as they are generated, or use async calls with an async client. You can also switch to other Fireworks AI models by changing the model parameter.

python
import asyncio
import os
from openai import OpenAI

async def main():
    client = OpenAI(
        api_key=os.environ["FIREWORKS_API_KEY"],
        base_url="https://api.fireworks.ai/inference/v1"
    )

    # Async streaming example
    stream = await client.chat.completions.create(
        model="accounts/fireworks/models/mixtral-8x7b-instruct",
        messages=[{"role": "user", "content": "Tell me a joke."}],
        stream=True
    )

    async for chunk in stream:
        print(chunk.choices[0].delta.content or "", end="", flush=True)

if __name__ == "__main__":
    asyncio.run(main())
output
Why did the AI go to school? Because it wanted to improve its neural networks!

Troubleshooting

  • If you get authentication errors, verify your FIREWORKS_API_KEY environment variable is set correctly.
  • For model not found errors, confirm the model name is exactly accounts/fireworks/models/mixtral-8x7b-instruct.
  • Network errors may require checking your internet connection or firewall settings.

Key Takeaways

  • Use the OpenAI SDK with Fireworks AI's base_url to access Mixtral models.
  • Specify the full model name starting with 'accounts/fireworks/models/' for Fireworks AI.
  • Streaming and async calls are supported for real-time token generation.
  • Always set your API key in the environment variable FIREWORKS_API_KEY.
  • Check exact model names and API endpoint URLs to avoid errors.
Verified 2026-04 · accounts/fireworks/models/mixtral-8x7b-instruct
Verify ↗