How to intermediate · 3 min read

How to add fallback models in LiteLLM proxy

Quick answer

In LiteLLM proxy, you add fallback models by specifying a prioritized list of models in the proxy configuration under the models section. The proxy attempts the primary model first and automatically falls back to the next model(s) if the primary fails or is overloaded, ensuring high availability.

PREREQUISITES

Python 3.8+
LiteLLM proxy installed
Basic knowledge of LiteLLM proxy configuration
Access to multiple AI models (local or remote)

Setup LiteLLM proxy

Install LiteLLM proxy if not already installed. You can install it via pip:

pip install litellm-proxy

Ensure you have access to at least two AI models you want to use as primary and fallback.

Step by step configuration

Create or edit your litellm_proxy.yaml configuration file to define multiple models in a prioritized list. The proxy will try the first model and fallback to the next if it encounters errors or overloads.

yaml

models:
  - name: primary-model
    type: openai
    model: gpt-4o
    api_key_env: OPENAI_API_KEY
  - name: fallback-model
    type: openai
    model: gpt-4o-mini
    api_key_env: OPENAI_API_KEY

proxy:
  listen: 127.0.0.1:11434
  fallback_enabled: true
  fallback_order:
    - primary-model
    - fallback-model

Example usage with fallback

Run the LiteLLM proxy with the above configuration. When you send a request, it will first try primary-model. If it fails (e.g., rate limit or error), it automatically retries with fallback-model.

Example Python client code to call the proxy:

python

import os
import requests

proxy_url = "http://127.0.0.1:11434/v1/chat/completions"

headers = {"Content-Type": "application/json"}
data = {
    "model": "primary-model",
    "messages": [{"role": "user", "content": "Hello from LiteLLM proxy with fallback!"}]
}

response = requests.post(proxy_url, json=data, headers=headers)
print(response.json())

output

{"id": "chatcmpl-xxx", "choices": [{"message": {"role": "assistant", "content": "Hello from LiteLLM proxy with fallback!"}}]}

Common variations

Use local models as fallback by specifying type: local and model path.
Configure multiple fallback models by adding more entries in fallback_order.
Enable or disable fallback dynamically via proxy flags.

Troubleshooting fallback issues

If fallback does not trigger, verify fallback_enabled: true is set in config.
Check logs for errors indicating why primary model failed.
Ensure all models have valid API keys or local paths.
Test each model independently to confirm availability.

✅

Key Takeaways

Configure multiple models in LiteLLM proxy with a prioritized fallback list.
Enable fallback by setting fallback_enabled: true in the proxy config.
Test fallback by simulating primary model failure to ensure smooth failover.
You can mix remote and local models as fallback targets in LiteLLM proxy.
Check proxy logs and config syntax if fallback does not behave as expected.

Verified 2026-04 · gpt-4o, gpt-4o-mini

Verify ↗