Code beginner · 3 min read

How to call Mistral API in Python

Q: How to call Mistral API in Python

Use the mistralai Python SDK by importing Mistral, initializing the client with your API key from os.environ, and calling client.chat.complete() with your model and messages.

Direct answer

Use the mistralai Python SDK by importing Mistral, initializing the client with your API key from os.environ, and calling client.chat.complete() with your model and messages.

Setup

Install

bash

pip install mistralai

Env vars

MISTRAL_API_KEY

Imports

python

from mistralai import Mistral
import os

Examples

inHello, how are you?

outI'm doing great, thank you! How can I assist you today?

inExplain the benefits of using Mistral API.

outMistral API offers fast, reliable, and state-of-the-art large language models with easy integration and competitive pricing.

outPlease provide a prompt to generate a response.

Integration steps

Install the mistralai package via pip.
Set your MISTRAL_API_KEY environment variable securely.
Import Mistral and os modules in your Python script.
Initialize the Mistral client with the API key from os.environ.
Create a messages list with role and content for the chat completion.
Call client.chat.complete() with the model and messages.
Extract and use the response from response.choices[0].message.content.

Full code

python

from mistralai import Mistral
import os

# Initialize client with API key from environment
client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])

# Prepare chat messages
messages = [{"role": "user", "content": "Hello, how are you?"}]

# Call the chat completion endpoint
response = client.chat.complete(
    model="mistral-large-latest",
    messages=messages
)

# Extract and print the response text
print("Response:", response.choices[0].message.content)

output

Response: I'm doing great, thank you! How can I assist you today?

API trace

Request

json

{"model": "mistral-large-latest", "messages": [{"role": "user", "content": "Hello, how are you?"}]}

Response

json

{"choices": [{"message": {"role": "assistant", "content": "I'm doing great, thank you! How can I assist you today?"}}], "usage": {"prompt_tokens": 10, "completion_tokens": 15, "total_tokens": 25}}

Extractresponse.choices[0].message.content

Variants

Streaming response ›

Use streaming to display partial responses in real-time for better user experience with long outputs.

python

from mistralai import Mistral
import os

client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])

messages = [{"role": "user", "content": "Tell me a story."}]

# Streaming chat completion
for chunk in client.chat.stream_complete(model="mistral-large-latest", messages=messages):
    print(chunk.choices[0].delta.get("content", ""), end="", flush=True)
print()

Async call ›

Use async calls to integrate Mistral API in asynchronous Python applications for concurrency.

python

import asyncio
from mistralai import Mistral
import os

async def main():
    client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])
    messages = [{"role": "user", "content": "What is AI?"}]
    response = await client.chat.complete_async(model="mistral-large-latest", messages=messages)
    print("Async response:", response.choices[0].message.content)

asyncio.run(main())

Use smaller model for faster response ›

Use the smaller model for quicker responses with lower cost when high detail is not required.

python

from mistralai import Mistral
import os

client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])
messages = [{"role": "user", "content": "Summarize the latest news."}]
response = client.chat.complete(model="mistral-small-latest", messages=messages)
print("Summary:", response.choices[0].message.content)

Performance

Latency~700ms for <code>mistral-large-latest</code> non-streaming calls

Cost~$0.0015 per 1000 tokens for <code>mistral-large-latest</code>

Rate limitsDefault tier: 300 requests per minute, 60,000 tokens per minute

Use concise prompts to reduce token usage.
Prefer smaller models for less critical tasks to save cost.
Cache frequent queries to avoid repeated calls.

Approach	Latency	Cost/call	Best for
Standard call	~700ms	~$0.0015/1k tokens	General purpose, balanced speed and cost
Streaming response	Starts within 300ms	Same as standard	Long outputs, better UX
Async call	~700ms	~$0.0015/1k tokens	Concurrent applications
Smaller model	~300ms	~$0.0005/1k tokens	Faster, cheaper, less detailed

✓

Quick tip

Always set your API key in the environment variable <code>MISTRAL_API_KEY</code> and never hardcode it in your code.

⚠

Common mistake

Beginners often forget to pass the <code>messages</code> parameter as a list of dicts with roles, causing API errors.

Verified 2026-04 · mistral-large-latest, mistral-small-latest

Verify ↗