How to beginner · 3 min read

How to call Mistral with LiteLLM

Q: How to call Mistral with LiteLLM

Use the litellm Python package to call Mistral models by specifying the model name like mistral-large-latest when creating an LLM instance. Then invoke generate with your prompt to get completions from Mistral via LiteLLM.

Quick answer

Use the litellm Python package to call Mistral models by specifying the model name like mistral-large-latest when creating an LLM instance. Then invoke generate with your prompt to get completions from Mistral via LiteLLM.

PREREQUISITES

Python 3.8+
pip install litellm
Access to a LiteLLM server running Mistral models or configured remote endpoint

Setup

Install the litellm Python package and ensure you have access to a LiteLLM server hosting Mistral models. You can run LiteLLM locally or connect to a remote server.

bash

pip install litellm

Step by step

This example shows how to call the mistral-large-latest model using litellm. It creates an LLM instance, sends a prompt, and prints the generated text.

python

from litellm import LLM

# Initialize the LLM with the Mistral model
llm = LLM(model="mistral-large-latest")

# Define your prompt
prompt = "Write a short poem about spring."

# Generate completion
outputs = llm.generate([prompt])

# Print the first output text
print(outputs[0].outputs[0].text)

output

Write a short poem about spring.

Spring awakens with gentle breeze,
Blossoms dance on budding trees.
Sunlight warms the earth anew,
Life returns in vibrant hue.

Common variations

Async calls: Use generate with await in an async function.
Different models: Replace mistral-large-latest with other Mistral variants like mistral-small-latest.
Custom server: Specify base_url in LLM if connecting to a remote LiteLLM server.

python

import asyncio
from litellm import LLM

async def async_example():
    llm = LLM(model="mistral-large-latest")
    prompt = "Explain quantum computing in simple terms."
    outputs = await llm.generate([prompt])
    print(outputs[0].outputs[0].text)

asyncio.run(async_example())

output

Quantum computing uses quantum bits, or qubits, which can be both 0 and 1 at the same time, allowing computers to solve certain problems much faster than classical computers.

Troubleshooting

If you get connection errors, verify your LiteLLM server is running and accessible.
For model not found errors, confirm the model name is correct and available on your LiteLLM instance.
Check your Python environment and litellm installation if import errors occur.

✅

Key Takeaways

Use the litellm package to easily call Mistral models with simple Python code.
Specify the Mistral model name when creating the LLM instance to target different model sizes.
Async and custom server options provide flexibility for advanced LiteLLM usage.
Ensure your LiteLLM server is running and accessible to avoid connection issues.
Keep litellm updated to support the latest Mistral models and features.

Verified 2026-04 · mistral-large-latest, mistral-small-latest

Verify ↗