How to add models to LiteLLM proxy
Quick answer
To add models to LiteLLM proxy, define your model configurations in the proxy's models.yaml or JSON config file, specifying model name, path, and parameters. Then restart the proxy server to load the new models for inference via the proxy API.
PREREQUISITES
Python 3.8+LiteLLM installed (pip install litellm)Basic knowledge of YAML or JSON configuration filesAccess to model files or URLs
Setup LiteLLM proxy
Install LiteLLM via pip and prepare your environment variables if needed. The proxy runs locally and manages multiple models through a configuration file.
pip install litellm Step by step model addition
Edit the models.yaml file in your LiteLLM proxy directory to add new models. Each model entry requires a unique name, type (e.g., llama, gptq), and path to the model files. After saving, restart the proxy to load the models.
models.yaml example:
models:
- name: llama-3b
type: llama
path: /models/llama-3b
quantization: q4_0
- name: gptq-4bit
type: gptq
path: /models/gptq-4bit
# Start or restart LiteLLM proxy
litellm proxy start --config models.yaml
# Python example to query the proxy
import requests
url = 'http://localhost:11434/v1/chat/completions'
headers = {'Content-Type': 'application/json'}
data = {
'model': 'llama-3b',
'messages': [{'role': 'user', 'content': 'Hello LiteLLM!'}]
}
response = requests.post(url, json=data, headers=headers)
print(response.json()) output
{"id": "chatcmpl-xxx", "choices": [{"message": {"role": "assistant", "content": "Hello LiteLLM! How can I assist you today?"}}]} Common variations
You can add models of different types such as llama, gptq, or ggml by specifying the type in the config. For asynchronous usage, query the proxy API with async HTTP clients like httpx. You can also configure quantization and device options per model.
import asyncio
import httpx
async def query_litellm():
async with httpx.AsyncClient() as client:
data = {
'model': 'gptq-4bit',
'messages': [{'role': 'user', 'content': 'Async query example'}]
}
response = await client.post('http://localhost:11434/v1/chat/completions', json=data)
print(response.json())
asyncio.run(query_litellm()) output
{"id": "chatcmpl-yyy", "choices": [{"message": {"role": "assistant", "content": "This is an async response from LiteLLM proxy."}}]} Troubleshooting
- If the proxy fails to start, verify your
models.yamlsyntax and paths. - Check that model files exist and have correct permissions.
- Use
litellm proxy logsto inspect runtime errors. - Ensure no port conflicts on
11434or your configured proxy port.
Key Takeaways
- Add models by editing the LiteLLM proxy config file with model name, type, and path.
- Restart the LiteLLM proxy server after config changes to load new models.
- Use HTTP requests to query models served by LiteLLM proxy locally or remotely.
- Support for multiple model types and async querying enhances flexibility.
- Check logs and config syntax if the proxy fails to load models or start.