How to use Codestral API in Python
Direct answer
Use the
mistralai Python SDK or the OpenAI-compatible OpenAI client with base_url="https://api.mistral.ai/v1" and specify model="codestral-latest" to call the Codestral API in Python.Setup
Install
pip install mistralai Env vars
MISTRAL_API_KEY Imports
from mistralai import Mistral
import os Examples
inHello, can you write a Python function to reverse a string?
outSure! Here's a Python function that reverses a string:
```python
def reverse_string(s):
return s[::-1]
```
inExplain the concept of recursion with an example.
outRecursion is a programming technique where a function calls itself to solve smaller instances of a problem. For example:
```python
def factorial(n):
if n == 0:
return 1
else:
return n * factorial(n-1)
```
inGenerate a haiku about spring.
outGentle spring breeze blows,
Cherry blossoms softly fall,
New life wakes the earth.
Integration steps
- Install the mistralai Python package and set the MISTRAL_API_KEY environment variable.
- Import the Mistral client and initialize it with your API key from os.environ.
- Call the chat.complete method with model='codestral-latest' and a messages list containing user input.
- Extract the generated text from response.choices[0].message.content.
- Print or use the generated response in your application.
Full code
from mistralai import Mistral
import os
# Initialize the Mistral client with your API key
client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])
# Define the user message
messages = [{"role": "user", "content": "Write a Python function to check if a number is prime."}]
# Call the Codestral model for chat completion
response = client.chat.complete(
model="codestral-latest",
messages=messages
)
# Extract and print the generated content
print("Response from Codestral:")
print(response.choices[0].message.content) output
Response from Codestral:
Here's a Python function to check if a number is prime:
```python
def is_prime(n):
if n <= 1:
return False
for i in range(2, int(n ** 0.5) + 1):
if n % i == 0:
return False
return True
``` API trace
Request
{"model": "codestral-latest", "messages": [{"role": "user", "content": "Write a Python function to check if a number is prime."}]} Response
{"choices": [{"message": {"content": "Here's a Python function to check if a number is prime: ..."}}], "usage": {"prompt_tokens": 20, "completion_tokens": 80, "total_tokens": 100}} Extract
response.choices[0].message.contentVariants
Streaming response ›
Use streaming to display partial results immediately for better user experience on long responses.
from mistralai import Mistral
import os
client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])
messages = [{"role": "user", "content": "Explain quicksort algorithm."}]
# Streaming chat completion
for chunk in client.chat.stream(
model="codestral-latest",
messages=messages
):
print(chunk.choices[0].delta.get('content', ''), end='')
print() Async version ›
Use async calls to handle multiple concurrent requests efficiently in asynchronous applications.
import asyncio
from mistralai import Mistral
import os
async def main():
client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])
messages = [{"role": "user", "content": "Summarize the benefits of AI."}]
response = await client.chat.complete_async(
model="codestral-latest",
messages=messages
)
print(response.choices[0].message.content)
asyncio.run(main()) Alternative model: mistral-large-latest ›
Use the general-purpose mistral-large-latest model for broader tasks beyond code generation.
from mistralai import Mistral
import os
client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])
messages = [{"role": "user", "content": "Generate a poem about the ocean."}]
response = client.chat.complete(
model="mistral-large-latest",
messages=messages
)
print(response.choices[0].message.content) Performance
Latency~1.2 seconds per 500 tokens for non-streaming calls
Cost~$0.003 per 1,000 tokens with Codestral model
Rate limitsDefault tier: 300 requests per minute, 60,000 tokens per minute
- Keep prompts concise to reduce token usage.
- Use streaming to start processing output early and reduce perceived latency.
- Cache frequent queries to avoid repeated calls.
| Approach | Latency | Cost/call | Best for |
|---|---|---|---|
| Standard chat.complete | ~1.2s | ~$0.003/1K tokens | Simple synchronous calls |
| Streaming chat.stream | Starts immediately, ~1.2s total | ~$0.003/1K tokens | Long responses with better UX |
| Async chat.complete_async | ~1.2s | ~$0.003/1K tokens | Concurrent async applications |
Quick tip
Always specify <code>model="codestral-latest"</code> and use the <code>chat.complete</code> method for best results with the Codestral API.
Common mistake
Beginners often forget to set the <code>MISTRAL_API_KEY</code> environment variable or use the wrong model name, causing authentication or model errors.