Code beginner · 3 min read

How to use Codestral API in Python

Q: How to use Codestral API in Python

Use the mistralai Python SDK or the OpenAI-compatible OpenAI client with base_url="https://api.mistral.ai/v1" and specify model="codestral-latest" to call the Codestral API in Python.

Direct answer

Use the mistralai Python SDK or the OpenAI-compatible OpenAI client with base_url="https://api.mistral.ai/v1" and specify model="codestral-latest" to call the Codestral API in Python.

Setup

Install

bash

pip install mistralai

Env vars

MISTRAL_API_KEY

Imports

python

from mistralai import Mistral
import os

Examples

inHello, can you write a Python function to reverse a string?

outSure! Here's a Python function that reverses a string: ```python def reverse_string(s): return s[::-1] ```

inExplain the concept of recursion with an example.

outRecursion is a programming technique where a function calls itself to solve smaller instances of a problem. For example: ```python def factorial(n): if n == 0: return 1 else: return n * factorial(n-1) ```

inGenerate a haiku about spring.

outGentle spring breeze blows, Cherry blossoms softly fall, New life wakes the earth.

Integration steps

Install the mistralai Python package and set the MISTRAL_API_KEY environment variable.
Import the Mistral client and initialize it with your API key from os.environ.
Call the chat.complete method with model='codestral-latest' and a messages list containing user input.
Extract the generated text from response.choices[0].message.content.
Print or use the generated response in your application.

Full code

python

from mistralai import Mistral
import os

# Initialize the Mistral client with your API key
client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])

# Define the user message
messages = [{"role": "user", "content": "Write a Python function to check if a number is prime."}]

# Call the Codestral model for chat completion
response = client.chat.complete(
    model="codestral-latest",
    messages=messages
)

# Extract and print the generated content
print("Response from Codestral:")
print(response.choices[0].message.content)

output

Response from Codestral:
Here's a Python function to check if a number is prime:

```python
def is_prime(n):
    if n <= 1:
        return False
    for i in range(2, int(n ** 0.5) + 1):
        if n % i == 0:
            return False
    return True
```

API trace

Request

json

{"model": "codestral-latest", "messages": [{"role": "user", "content": "Write a Python function to check if a number is prime."}]}

Response

json

{"choices": [{"message": {"content": "Here's a Python function to check if a number is prime: ..."}}], "usage": {"prompt_tokens": 20, "completion_tokens": 80, "total_tokens": 100}}

Extractresponse.choices[0].message.content

Variants

Streaming response ›

Use streaming to display partial results immediately for better user experience on long responses.

python

from mistralai import Mistral
import os

client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])

messages = [{"role": "user", "content": "Explain quicksort algorithm."}]

# Streaming chat completion
for chunk in client.chat.stream(
    model="codestral-latest",
    messages=messages
):
    print(chunk.choices[0].delta.get('content', ''), end='')
print()

Async version ›

Use async calls to handle multiple concurrent requests efficiently in asynchronous applications.

python

import asyncio
from mistralai import Mistral
import os

async def main():
    client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])
    messages = [{"role": "user", "content": "Summarize the benefits of AI."}]
    response = await client.chat.complete_async(
        model="codestral-latest",
        messages=messages
    )
    print(response.choices[0].message.content)

asyncio.run(main())

Alternative model: mistral-large-latest ›

Use the general-purpose mistral-large-latest model for broader tasks beyond code generation.

python

from mistralai import Mistral
import os

client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])
messages = [{"role": "user", "content": "Generate a poem about the ocean."}]
response = client.chat.complete(
    model="mistral-large-latest",
    messages=messages
)
print(response.choices[0].message.content)

Performance

Latency~1.2 seconds per 500 tokens for non-streaming calls

Cost~$0.003 per 1,000 tokens with Codestral model

Rate limitsDefault tier: 300 requests per minute, 60,000 tokens per minute

Keep prompts concise to reduce token usage.
Use streaming to start processing output early and reduce perceived latency.
Cache frequent queries to avoid repeated calls.

Approach	Latency	Cost/call	Best for
Standard chat.complete	~1.2s	~$0.003/1K tokens	Simple synchronous calls
Streaming chat.stream	Starts immediately, ~1.2s total	~$0.003/1K tokens	Long responses with better UX
Async chat.complete_async	~1.2s	~$0.003/1K tokens	Concurrent async applications

✓

Quick tip

Always specify <code>model="codestral-latest"</code> and use the <code>chat.complete</code> method for best results with the Codestral API.

⚠

Common mistake

Beginners often forget to set the <code>MISTRAL_API_KEY</code> environment variable or use the wrong model name, causing authentication or model errors.

Verified 2026-04 · codestral-latest, mistral-large-latest

Verify ↗