How to add fallback models in LiteLLM proxy
Quick answer
In LiteLLM proxy, you add fallback models by specifying a prioritized list of models in the proxy configuration under the models section. The proxy attempts the primary model first and automatically falls back to the next model(s) if the primary fails or is overloaded, ensuring high availability.
PREREQUISITES
Python 3.8+LiteLLM proxy installedBasic knowledge of LiteLLM proxy configurationAccess to multiple AI models (local or remote)
Setup LiteLLM proxy
Install LiteLLM proxy if not already installed. You can install it via pip:
pip install litellm-proxyEnsure you have access to at least two AI models you want to use as primary and fallback.
Step by step configuration
Create or edit your litellm_proxy.yaml configuration file to define multiple models in a prioritized list. The proxy will try the first model and fallback to the next if it encounters errors or overloads.
models:
- name: primary-model
type: openai
model: gpt-4o
api_key_env: OPENAI_API_KEY
- name: fallback-model
type: openai
model: gpt-4o-mini
api_key_env: OPENAI_API_KEY
proxy:
listen: 127.0.0.1:11434
fallback_enabled: true
fallback_order:
- primary-model
- fallback-model Example usage with fallback
Run the LiteLLM proxy with the above configuration. When you send a request, it will first try primary-model. If it fails (e.g., rate limit or error), it automatically retries with fallback-model.
Example Python client code to call the proxy:
import os
import requests
proxy_url = "http://127.0.0.1:11434/v1/chat/completions"
headers = {"Content-Type": "application/json"}
data = {
"model": "primary-model",
"messages": [{"role": "user", "content": "Hello from LiteLLM proxy with fallback!"}]
}
response = requests.post(proxy_url, json=data, headers=headers)
print(response.json()) output
{"id": "chatcmpl-xxx", "choices": [{"message": {"role": "assistant", "content": "Hello from LiteLLM proxy with fallback!"}}]} Common variations
- Use local models as fallback by specifying
type: localand model path. - Configure multiple fallback models by adding more entries in
fallback_order. - Enable or disable fallback dynamically via proxy flags.
Troubleshooting fallback issues
- If fallback does not trigger, verify
fallback_enabled: trueis set in config. - Check logs for errors indicating why primary model failed.
- Ensure all models have valid API keys or local paths.
- Test each model independently to confirm availability.
Key Takeaways
- Configure multiple models in LiteLLM proxy with a prioritized fallback list.
- Enable fallback by setting fallback_enabled: true in the proxy config.
- Test fallback by simulating primary model failure to ensure smooth failover.
- You can mix remote and local models as fallback targets in LiteLLM proxy.
- Check proxy logs and config syntax if fallback does not behave as expected.