Code beginner · 3 min read

How to use Codestral API in Python

Direct answer
Use the mistralai Python SDK or the OpenAI-compatible OpenAI client with base_url="https://api.mistral.ai/v1" and specify model="codestral-latest" to call the Codestral API in Python.

Setup

Install
bash
pip install mistralai
Env vars
MISTRAL_API_KEY
Imports
python
from mistralai import Mistral
import os

Examples

inHello, can you write a Python function to reverse a string?
outSure! Here's a Python function that reverses a string: ```python def reverse_string(s): return s[::-1] ```
inExplain the concept of recursion with an example.
outRecursion is a programming technique where a function calls itself to solve smaller instances of a problem. For example: ```python def factorial(n): if n == 0: return 1 else: return n * factorial(n-1) ```
inGenerate a haiku about spring.
outGentle spring breeze blows, Cherry blossoms softly fall, New life wakes the earth.

Integration steps

  1. Install the mistralai Python package and set the MISTRAL_API_KEY environment variable.
  2. Import the Mistral client and initialize it with your API key from os.environ.
  3. Call the chat.complete method with model='codestral-latest' and a messages list containing user input.
  4. Extract the generated text from response.choices[0].message.content.
  5. Print or use the generated response in your application.

Full code

python
from mistralai import Mistral
import os

# Initialize the Mistral client with your API key
client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])

# Define the user message
messages = [{"role": "user", "content": "Write a Python function to check if a number is prime."}]

# Call the Codestral model for chat completion
response = client.chat.complete(
    model="codestral-latest",
    messages=messages
)

# Extract and print the generated content
print("Response from Codestral:")
print(response.choices[0].message.content)
output
Response from Codestral:
Here's a Python function to check if a number is prime:

```python
def is_prime(n):
    if n <= 1:
        return False
    for i in range(2, int(n ** 0.5) + 1):
        if n % i == 0:
            return False
    return True
```

API trace

Request
json
{"model": "codestral-latest", "messages": [{"role": "user", "content": "Write a Python function to check if a number is prime."}]}
Response
json
{"choices": [{"message": {"content": "Here's a Python function to check if a number is prime: ..."}}], "usage": {"prompt_tokens": 20, "completion_tokens": 80, "total_tokens": 100}}
Extractresponse.choices[0].message.content

Variants

Streaming response

Use streaming to display partial results immediately for better user experience on long responses.

python
from mistralai import Mistral
import os

client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])

messages = [{"role": "user", "content": "Explain quicksort algorithm."}]

# Streaming chat completion
for chunk in client.chat.stream(
    model="codestral-latest",
    messages=messages
):
    print(chunk.choices[0].delta.get('content', ''), end='')
print()
Async version

Use async calls to handle multiple concurrent requests efficiently in asynchronous applications.

python
import asyncio
from mistralai import Mistral
import os

async def main():
    client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])
    messages = [{"role": "user", "content": "Summarize the benefits of AI."}]
    response = await client.chat.complete_async(
        model="codestral-latest",
        messages=messages
    )
    print(response.choices[0].message.content)

asyncio.run(main())
Alternative model: mistral-large-latest

Use the general-purpose mistral-large-latest model for broader tasks beyond code generation.

python
from mistralai import Mistral
import os

client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])
messages = [{"role": "user", "content": "Generate a poem about the ocean."}]
response = client.chat.complete(
    model="mistral-large-latest",
    messages=messages
)
print(response.choices[0].message.content)

Performance

Latency~1.2 seconds per 500 tokens for non-streaming calls
Cost~$0.003 per 1,000 tokens with Codestral model
Rate limitsDefault tier: 300 requests per minute, 60,000 tokens per minute
  • Keep prompts concise to reduce token usage.
  • Use streaming to start processing output early and reduce perceived latency.
  • Cache frequent queries to avoid repeated calls.
ApproachLatencyCost/callBest for
Standard chat.complete~1.2s~$0.003/1K tokensSimple synchronous calls
Streaming chat.streamStarts immediately, ~1.2s total~$0.003/1K tokensLong responses with better UX
Async chat.complete_async~1.2s~$0.003/1K tokensConcurrent async applications

Quick tip

Always specify <code>model="codestral-latest"</code> and use the <code>chat.complete</code> method for best results with the Codestral API.

Common mistake

Beginners often forget to set the <code>MISTRAL_API_KEY</code> environment variable or use the wrong model name, causing authentication or model errors.

Verified 2026-04 · codestral-latest, mistral-large-latest
Verify ↗