Mistral hardware requirements
Quick answer
To run
Mistral models locally, you need a modern GPU with at least 16GB VRAM for mistral-large-latest, or 8GB VRAM for smaller variants. CPU-only usage is possible but significantly slower; at minimum, use a multi-core CPU with 16GB RAM and SSD storage for optimal performance.PREREQUISITES
Python 3.8+pip install openai>=1.0MISTRAL_API_KEY environment variable set
Setup
Install the openai Python SDK to access Mistral models via API. Set your MISTRAL_API_KEY as an environment variable for authentication.
pip install openai>=1.0 Step by step
Use the OpenAI-compatible SDK to call Mistral models. This example shows a simple chat completion request using mistral-large-latest.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["MISTRAL_API_KEY"], base_url="https://api.mistral.ai/v1")
response = client.chat.completions.create(
model="mistral-large-latest",
messages=[{"role": "user", "content": "Hello, what are the hardware requirements for Mistral models?"}]
)
print(response.choices[0].message.content) output
Mistral models require a GPU with at least 16GB VRAM for large models, 8GB for smaller ones. CPU-only is possible but slower. Ensure 16GB RAM and SSD storage for best performance.
Common variations
You can use smaller Mistral models like mistral-small-latest which require less VRAM (~8GB). For asynchronous calls, use Python asyncio with the OpenAI SDK. Streaming responses are supported via the stream=True parameter.
import os
import asyncio
from openai import OpenAI
async def async_chat():
client = OpenAI(api_key=os.environ["MISTRAL_API_KEY"], base_url="https://api.mistral.ai/v1")
response = await client.chat.completions.acreate(
model="mistral-small-latest",
messages=[{"role": "user", "content": "Explain Mistral hardware needs."}],
stream=True
)
async for chunk in response:
print(chunk.choices[0].delta.get('content', ''), end='')
asyncio.run(async_chat()) output
Mistral models require a GPU with at least 8GB VRAM for smaller models. CPU usage is possible but slower. Recommended RAM is 16GB or more.
Troubleshooting
- If you encounter
CUDA out of memoryerrors, reduce batch size or switch to a smaller model likemistral-small-latest. - For slow CPU-only inference, consider upgrading to a GPU with at least 16GB VRAM.
- Ensure your
MISTRAL_API_KEYis correctly set in your environment variables.
Key Takeaways
- Use a GPU with at least 16GB VRAM for
mistral-large-latestto run efficiently. - Smaller models like
mistral-small-latestrequire around 8GB VRAM and less memory. - CPU-only usage is possible but significantly slower; 16GB RAM and SSD storage are recommended.
- Set
MISTRAL_API_KEYin your environment to authenticate API calls. - Use the OpenAI-compatible SDK with
base_url="https://api.mistral.ai/v1"for integration.