Cheat Sheet intermediate · 8 min read

Mistral AI Cheat Sheet — API, Models & Parameters — Mistral

version 0.4.x

Mistral's fast, open inference API

MISTRAL_API_KEY

install pip install mistral-sdk

core imports

python

from mistralai import Mistral
from mistralai.models import ChatMessage

Mental model

Mistral is Europe's fastest open-weight inference API with function calling.

Like having a high-performance European data center that runs Mistral models at lower latency and cost than US-based APIs, with built-in support for complex function orchestration.

Common Use Patterns

01 Basic Chat Completion

Simple prompt → response, no streaming

python

import os
from mistralai import Mistral

client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])

message = client.chat.complete(
    model="mistral-large-latest",
    messages=[
        {"role": "user", "content": "What is 2+2?"}
    ]
)
print(message.choices[0].message.content)

output 4

Model names change (e.g., mistral-large-2407 → mistral-large-latest). Use 'latest' aliases in production, or pin versions and test before upgrades.

02 Streaming Response

Real-time token output for UI/chat

python

import os
from mistralai import Mistral

client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])

with client.chat.stream(
    model="mistral-large-latest",
    messages=[{"role": "user", "content": "Write a poem about coding"}]
) as response:
    for chunk in response:
        if chunk.data.choices[0].delta.content:
            print(chunk.data.choices[0].delta.content, end="", flush=True)

Context managers required; close stream properly or tokens leak. Streaming chunks may be empty: always check delta.content before use.

03 Function Calling

LLM decides to call external functions, not just text

python

import os
import json
from mistralai import Mistral

client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"},
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
                },
                "required": ["location"]
            }
        }
    }
]

message = client.chat.complete(
    model="mistral-large-latest",
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    tools=tools
)

if message.choices[0].message.tool_calls:
    for tool_call in message.choices[0].message.tool_calls:
        print(f"Function: {tool_call.function.name}")
        print(f"Args: {tool_call.function.arguments}")

Tool schemas must exactly match OpenAI format. Missing 'required' array or wrong 'type' values cause silent failures. Always validate schema before deploy.

04 Generate Embeddings

Convert text to vectors for similarity search, RAG

python

import os
from mistralai import Mistral

client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])

response = client.embeddings.create(
    model="mistral-embed",
    inputs=["What is machine learning?", "Define AI"]
)

for i, embedding in enumerate(response.data):
    print(f"Text {i}: {len(embedding.embedding)} dimensions")

output

Text 0: 1024 dimensions
Text 1: 1024 dimensions

Mistral Embed produces 1024-dim vectors, not 1536. If you're migrating from OpenAI, re-embed everything and rebuild indexes or use dimension mismatches.

05 System Prompt & Context

Give LLM a role or behavior instructions

python

import os
from mistralai import Mistral

client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])

message = client.chat.complete(
    model="mistral-large-latest",
    messages=[
        {"role": "system", "content": "You are a Python expert. Answer only with code."},
        {"role": "user", "content": "How do I read a CSV file?"}
    ]
)
print(message.choices[0].message.content)

System message goes in 'role': 'system' in the array, not as a separate parameter. Order matters: system before user messages.

06 Control Randomness & Tokens

Tune creativity (temperature) or output length (max_tokens)

python

import os
from mistralai import Mistral

client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])

# Creative: temperature=0.9
response_creative = client.chat.complete(
    model="mistral-large-latest",
    messages=[{"role": "user", "content": "Write a haiku"}],
    temperature=0.9,
    max_tokens=100
)

# Deterministic: temperature=0.0
response_precise = client.chat.complete(
    model="mistral-large-latest",
    messages=[{"role": "user", "content": "What is 2+2?"}],
    temperature=0.0,
    max_tokens=50
)
print(response_precise.choices[0].message.content)

temperature=0.0 is not guaranteed deterministic across API versions. For truly reproducible results, use top_p=1.0 and temperature=0.0 together.

Core Request Parameters

chat.complete() parameters

Parameter	Type	Default	Notes
`model`	string	:	mistral-large-latest, mistral-medium-latest, mistral-small-latest, codestral-latest, mistral-nemo-latest
`messages`	array[dict]	:	Array of {"role": "user\|assistant\|system", "content": "..."}
`temperature`	float	0.7	0.0–1.0. Higher = more creative, lower = more deterministic
`max_tokens`	integer	4096	Max output tokens. Mistral Large supports up to 32k context
`top_p`	float	1.0	Nucleus sampling. 0.8 = sample from top 80% probability mass
`min_tokens`	integer	null	Minimum tokens to generate before stopping
`stop`	string\|array	null	Stop sequence(s). e.g., ["\n\n", "END"]
`random_seed`	integer	null	Set for reproducible outputs (with temperature=0)
`tools`	array[dict]	null	Function definitions for function calling
`tool_choice`	string	auto	"auto" \| "any" \| {"type": "function", "function": {"name": "..."}}

API Methods Reference

Method / Property	Description	Returns
`client.chat.complete(model, messages, ...)`	Single request-response chat completion. Returns ChatMessage with content and optional tool_calls.	ChatResponse with choices[0].message.content (str)
`client.chat.stream(model, messages, ...)`	Streaming completion. Returns context manager yielding chunks.	Iterator[ChatStreamResponse] with delta.content per chunk
`client.embeddings.create(model, inputs)`	Generate embeddings for texts. inputs can be string or list[str].	EmbeddingResponse with data[].embedding (list[float])
`client.models.list()`	List available models for your API key.	ModelListResponse with data[] containing model info

Common Errors & Fixes

01 AuthenticationError: Invalid API key

Cause: MISTRAL_API_KEY not set, expired, or typo in env var name

Fix:

python

import os
from mistralai import Mistral

api_key = os.environ.get("MISTRAL_API_KEY")
if not api_key:
    raise ValueError("MISTRAL_API_KEY not set in environment")
client = Mistral(api_key=api_key)

02 ModelNotFoundError: Model not found

Cause: Model name is outdated or not available in your region/tier

Fix:

python

# List available models first
from mistralai import Mistral
import os

client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])
models = client.models.list()
for model in models.data:
    print(model.id)

# Then use a valid model
message = client.chat.complete(
    model=models.data[0].id,  # Use first available
    messages=[{"role": "user", "content": "Hi"}]
)

03 ValidationError: Invalid tool schema

Cause: Tool schema missing 'required' array or 'type' mismatch in parameters

Fix:

python

# WRONG: missing 'required'
tools = [{"type": "function", "function": {"name": "get_weather", "parameters": {"type": "object", "properties": {"location": {"type": "string"}}}}}]

# CORRECT: include 'required'
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get weather",
        "parameters": {
            "type": "object",
            "properties": {"location": {"type": "string"}},
            "required": ["location"]  # ADD THIS
        }
    }
}]

04 RateLimitError: Too many requests

Cause: Exceeded API rate limits (requests/min or tokens/min)

Fix:

python

import time
from mistralai import Mistral
import os

client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])

def chat_with_retry(messages, retries=3, backoff=2):
    for attempt in range(retries):
        try:
            return client.chat.complete(
                model="mistral-large-latest",
                messages=messages
            )
        except Exception as e:
            if "RateLimitError" in str(type(e)) and attempt < retries - 1:
                wait = backoff ** attempt
                print(f"Rate limited. Waiting {wait}s...")
                time.sleep(wait)
            else:
                raise

05 ContextLengthExceededError: Input exceeds max tokens

Cause: Prompt + max_tokens exceeds model's context window (e.g., 32k for Large)

Fix:

python

from mistralai import Mistral
import os

client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])

# Check token count before sending
long_text = "Your long document here..."
messages = [{"role": "user", "content": long_text}]

# Truncate if needed (estimate: ~1.3 chars per token)
max_content_tokens = 30000  # Leave buffer for max_tokens
if len(long_text) > max_content_tokens * 1.3:
    long_text = long_text[:int(max_content_tokens * 1.3)]
    messages = [{"role": "user", "content": long_text}]

message = client.chat.complete(
    model="mistral-large-latest",
    messages=messages,
    max_tokens=2000
)

Production Gotchas

⚠ Model names are aliases, not stable

mistral-large-latest, mistral-medium-latest point to dated models (e.g., mistral-large-2407). Mistral rotates these quarterly. Pin exact versions in tests, use 'latest' only in dev. Always verify performance after upgrades.

⚠ Function calling schema is strict OpenAI format

Tool definitions must match OpenAI spec exactly (type, properties, required). Missing 'required' array or wrong 'type' string values cause the LLM to ignore the tool silently. Validate schemas with JSON schema validators before deploy.

⚠ Embedding dimension mismatch with OpenAI

Mistral Embed outputs 1024 dims, OpenAI is 1536. If migrating, you must re-embed all documents and rebuild vector indexes. Dimension mismatches cause silent failures in cosine similarity.

⚠ Streaming context manager must close properly

Using `with client.chat.stream(...) as response:` is required. Not closing consumes API tokens without content. Always check chunk.data.choices[0].delta.content: it may be None/empty.

⚠ temperature=0.0 is not truly deterministic

Mistral API doesn't guarantee identical outputs across calls with temperature=0. For reproducible results, also set random_seed and top_p=1.0, and test before release.

⚠ System message order matters

System role must appear before user/assistant messages in the array. Putting it in the middle or end causes unpredictable behavior. Always: [system, user, assistant, user, ...]

Mistral Models Comparison

Model	Best For	Context Window	Cost ($)	Speed
mistral-large-latest	Complex reasoning, function calling, long docs	32k tokens	€0.27/M input, €0.81/M output	Slower
mistral-small-latest	Simple tasks, high throughput, cost-sensitive	32k tokens	€0.04/M input, €0.12/M output	Fastest
codestral-latest	Code generation, completions, programming	32k tokens	€0.27/M input, €0.81/M output	Fast
mistral-nemo-latest	Lightweight, edge, low latency, efficiency	128k tokens	€0.03/M input, €0.09/M output	Fastest

Key Concepts

Function Calling

Mistral returns structured function names and arguments when tools are available, enabling external API orchestration without separate prompt parsing.

Tool Choice

Parameter controlling whether LLM uses tools: 'auto' (model decides), 'any' (must use a tool), or specific function name to force one tool.

Context Window

Maximum input + output tokens the model can process; mistral-large and mistral-small support 32k, nemo supports 128k.

Temperature

Controls randomness: 0.0 = deterministic, 0.7 = balanced (default), 1.0+ = highly creative. Higher = more diverse outputs.

Top-P (Nucleus Sampling)

Alternative to temperature: sample from smallest set of tokens whose probabilities sum to p. 0.8 = top 80% probability mass.

Embeddings

Dense vectors (1024-dim for Mistral Embed) representing text meaning, used for semantic search, RAG, and similarity.

Verified 2026-04 · vmistral-sdk 0.4.x · mistral-large

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.