How to beginner · 3 min read

How to set fallback models in LiteLLM

Quick answer
In LiteLLM, you set fallback models by specifying a list of model names in order of preference when initializing the client or inference pipeline. The system automatically tries the primary model first and falls back to the next models if the primary is unavailable or errors occur, ensuring robust inference.

PREREQUISITES

  • Python 3.8+
  • pip install litellm
  • Access to LiteLLM models or local model files

Setup

Install litellm via pip and prepare your environment. No API key is required for local models, but ensure you have access to the models you want to use as primary and fallback.

bash
pip install litellm

Step by step

Use the LiteLLM client or pipeline with a models list to define fallback order. The first model is primary; subsequent models are fallbacks.

python
from litellm import LiteLLM

# Define models in fallback order
models = ["gpt4o", "gpt4o-mini", "gpt4o-mini-fallback"]

# Initialize LiteLLM with fallback models
client = LiteLLM(models=models)

# Run inference; LiteLLM tries models in order until success
response = client.chat("Hello, fallback models!")
print(response)
output
Hello, fallback models! How can I assist you today?

Common variations

You can configure fallback models for different tasks like chat, completion, or embeddings by passing the models list to the respective pipeline or client. Async usage is supported similarly by awaiting the call.

python
import asyncio
from litellm import LiteLLM

async def async_chat():
    models = ["gpt4o", "gpt4o-mini"]
    client = LiteLLM(models=models)
    response = await client.chat_async("Async fallback test")
    print(response)

asyncio.run(async_chat())
output
Async fallback test received. How can I help?

Troubleshooting

  • If fallback models are not triggered, verify model names and availability.
  • Check logs for errors on primary model failures.
  • Ensure network or local file access for fallback models is configured correctly.

Key Takeaways

  • Specify fallback models as a prioritized list in LiteLLM initialization.
  • LiteLLM automatically tries fallback models on primary model failure.
  • Fallbacks work for sync and async calls across different tasks.
  • Verify model availability and names to ensure fallback triggers.
  • Use logging to diagnose fallback behavior and errors.
Verified 2026-04 · gpt4o, gpt4o-mini, gpt4o-mini-fallback
Verify ↗