Mistral AI Cheat Sheet — API, Models & Parameters — Mistral
from mistralai import Mistral
from mistralai.models import ChatMessage Mistral is Europe's fastest open-weight inference API with function calling.
Like having a high-performance European data center that runs Mistral models at lower latency and cost than US-based APIs, with built-in support for complex function orchestration.
Common Use Patterns
import os
from mistralai import Mistral
client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])
message = client.chat.complete(
model="mistral-large-latest",
messages=[
{"role": "user", "content": "What is 2+2?"}
]
)
print(message.choices[0].message.content) 4 import os
from mistralai import Mistral
client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])
with client.chat.stream(
model="mistral-large-latest",
messages=[{"role": "user", "content": "Write a poem about coding"}]
) as response:
for chunk in response:
if chunk.data.choices[0].delta.content:
print(chunk.data.choices[0].delta.content, end="", flush=True) import os
import json
from mistralai import Mistral
client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location"]
}
}
}
]
message = client.chat.complete(
model="mistral-large-latest",
messages=[{"role": "user", "content": "What's the weather in Paris?"}],
tools=tools
)
if message.choices[0].message.tool_calls:
for tool_call in message.choices[0].message.tool_calls:
print(f"Function: {tool_call.function.name}")
print(f"Args: {tool_call.function.arguments}") import os
from mistralai import Mistral
client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])
response = client.embeddings.create(
model="mistral-embed",
inputs=["What is machine learning?", "Define AI"]
)
for i, embedding in enumerate(response.data):
print(f"Text {i}: {len(embedding.embedding)} dimensions") Text 0: 1024 dimensions
Text 1: 1024 dimensions import os
from mistralai import Mistral
client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])
message = client.chat.complete(
model="mistral-large-latest",
messages=[
{"role": "system", "content": "You are a Python expert. Answer only with code."},
{"role": "user", "content": "How do I read a CSV file?"}
]
)
print(message.choices[0].message.content) import os
from mistralai import Mistral
client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])
# Creative: temperature=0.9
response_creative = client.chat.complete(
model="mistral-large-latest",
messages=[{"role": "user", "content": "Write a haiku"}],
temperature=0.9,
max_tokens=100
)
# Deterministic: temperature=0.0
response_precise = client.chat.complete(
model="mistral-large-latest",
messages=[{"role": "user", "content": "What is 2+2?"}],
temperature=0.0,
max_tokens=50
)
print(response_precise.choices[0].message.content) Core Request Parameters
chat.complete() parameters
| Parameter | Type | Default | Notes |
|---|---|---|---|
model | string | : | mistral-large-latest, mistral-medium-latest, mistral-small-latest, codestral-latest, mistral-nemo-latest |
messages | array[dict] | : | Array of {"role": "user|assistant|system", "content": "..."} |
temperature | float | 0.7 | 0.0–1.0. Higher = more creative, lower = more deterministic |
max_tokens | integer | 4096 | Max output tokens. Mistral Large supports up to 32k context |
top_p | float | 1.0 | Nucleus sampling. 0.8 = sample from top 80% probability mass |
min_tokens | integer | null | Minimum tokens to generate before stopping |
stop | string|array | null | Stop sequence(s). e.g., ["\n\n", "END"] |
random_seed | integer | null | Set for reproducible outputs (with temperature=0) |
tools | array[dict] | null | Function definitions for function calling |
tool_choice | string | auto | "auto" | "any" | {"type": "function", "function": {"name": "..."}} |
API Methods Reference
| Method / Property | Description | Returns |
|---|---|---|
client.chat.complete(model, messages, ...) | Single request-response chat completion. Returns ChatMessage with content and optional tool_calls. | ChatResponse with choices[0].message.content (str) |
client.chat.stream(model, messages, ...) | Streaming completion. Returns context manager yielding chunks. | Iterator[ChatStreamResponse] with delta.content per chunk |
client.embeddings.create(model, inputs) | Generate embeddings for texts. inputs can be string or list[str]. | EmbeddingResponse with data[].embedding (list[float]) |
client.models.list() | List available models for your API key. | ModelListResponse with data[] containing model info |
Common Errors & Fixes
AuthenticationError: Invalid API key Cause: MISTRAL_API_KEY not set, expired, or typo in env var name
import os
from mistralai import Mistral
api_key = os.environ.get("MISTRAL_API_KEY")
if not api_key:
raise ValueError("MISTRAL_API_KEY not set in environment")
client = Mistral(api_key=api_key) ModelNotFoundError: Model not found Cause: Model name is outdated or not available in your region/tier
# List available models first
from mistralai import Mistral
import os
client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])
models = client.models.list()
for model in models.data:
print(model.id)
# Then use a valid model
message = client.chat.complete(
model=models.data[0].id, # Use first available
messages=[{"role": "user", "content": "Hi"}]
) ValidationError: Invalid tool schema Cause: Tool schema missing 'required' array or 'type' mismatch in parameters
# WRONG: missing 'required'
tools = [{"type": "function", "function": {"name": "get_weather", "parameters": {"type": "object", "properties": {"location": {"type": "string"}}}}}]
# CORRECT: include 'required'
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather",
"parameters": {
"type": "object",
"properties": {"location": {"type": "string"}},
"required": ["location"] # ADD THIS
}
}
}] RateLimitError: Too many requests Cause: Exceeded API rate limits (requests/min or tokens/min)
import time
from mistralai import Mistral
import os
client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])
def chat_with_retry(messages, retries=3, backoff=2):
for attempt in range(retries):
try:
return client.chat.complete(
model="mistral-large-latest",
messages=messages
)
except Exception as e:
if "RateLimitError" in str(type(e)) and attempt < retries - 1:
wait = backoff ** attempt
print(f"Rate limited. Waiting {wait}s...")
time.sleep(wait)
else:
raise ContextLengthExceededError: Input exceeds max tokens Cause: Prompt + max_tokens exceeds model's context window (e.g., 32k for Large)
from mistralai import Mistral
import os
client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])
# Check token count before sending
long_text = "Your long document here..."
messages = [{"role": "user", "content": long_text}]
# Truncate if needed (estimate: ~1.3 chars per token)
max_content_tokens = 30000 # Leave buffer for max_tokens
if len(long_text) > max_content_tokens * 1.3:
long_text = long_text[:int(max_content_tokens * 1.3)]
messages = [{"role": "user", "content": long_text}]
message = client.chat.complete(
model="mistral-large-latest",
messages=messages,
max_tokens=2000
) Production Gotchas
mistral-large-latest, mistral-medium-latest point to dated models (e.g., mistral-large-2407). Mistral rotates these quarterly. Pin exact versions in tests, use 'latest' only in dev. Always verify performance after upgrades.
Tool definitions must match OpenAI spec exactly (type, properties, required). Missing 'required' array or wrong 'type' string values cause the LLM to ignore the tool silently. Validate schemas with JSON schema validators before deploy.
Mistral Embed outputs 1024 dims, OpenAI is 1536. If migrating, you must re-embed all documents and rebuild vector indexes. Dimension mismatches cause silent failures in cosine similarity.
Using `with client.chat.stream(...) as response:` is required. Not closing consumes API tokens without content. Always check chunk.data.choices[0].delta.content: it may be None/empty.
Mistral API doesn't guarantee identical outputs across calls with temperature=0. For reproducible results, also set random_seed and top_p=1.0, and test before release.
System role must appear before user/assistant messages in the array. Putting it in the middle or end causes unpredictable behavior. Always: [system, user, assistant, user, ...]
Mistral Models Comparison
| Model | Best For | Context Window | Cost ($) | Speed |
|---|---|---|---|---|
| mistral-large-latest | Complex reasoning, function calling, long docs | 32k tokens | €0.27/M input, €0.81/M output | Slower |
| mistral-small-latest | Simple tasks, high throughput, cost-sensitive | 32k tokens | €0.04/M input, €0.12/M output | Fastest |
| codestral-latest | Code generation, completions, programming | 32k tokens | €0.27/M input, €0.81/M output | Fast |
| mistral-nemo-latest | Lightweight, edge, low latency, efficiency | 128k tokens | €0.03/M input, €0.09/M output | Fastest |