How to set budget limits in LiteLLM
Quick answer
In LiteLLM, you set budget limits by configuring the max_tokens and max_cost parameters in the client or request settings to cap usage. This ensures your API calls do not exceed your desired token or cost budget during inference.
PREREQUISITES
Python 3.8+LiteLLM installed (pip install litellm)API key for your AI provider (e.g., OpenAI or Anthropic)Basic knowledge of Python async or sync programming
Setup
Install LiteLLM via pip and set your API key as an environment variable. This example assumes you use OpenAI as the backend.
pip install litellm
# Set your API key in the environment
# export OPENAI_API_KEY=os.environ["OPENAI_API_KEY"] # Linux/macOS
# setx OPENAI_API_KEY os.environ["OPENAI_API_KEY"] # Windows PowerShell Step by step
Use LiteLLM client configuration to set budget limits by specifying max_tokens and max_cost parameters. The example below shows how to limit token usage and cost per request.
import os
from litellm import LiteLLM
# Initialize LiteLLM client with budget limits
client = LiteLLM(
api_key=os.environ["OPENAI_API_KEY"],
provider="openai",
model="gpt-4o",
max_tokens=500, # Limit tokens per request
max_cost=0.01 # Limit cost per request in USD
)
# Make a request within budget
response = client.chat(
messages=[{"role": "user", "content": "Explain budget limits in LiteLLM."}]
)
print("Response:", response.text) output
Response: LiteLLM lets you set max_tokens and max_cost to control usage and spending per request.
Common variations
You can also set budget limits globally or per session, and use async calls with the same parameters. Different providers may support different cost calculation methods.
import asyncio
from litellm import LiteLLM
async def main():
client = LiteLLM(
api_key=os.environ["OPENAI_API_KEY"],
provider="openai",
model="gpt-4o",
max_tokens=300,
max_cost=0.005
)
response = await client.chat_async(
messages=[{"role": "user", "content": "Async budget limit example."}]
)
print("Async response:", response.text)
asyncio.run(main()) output
Async response: Budget limits work the same in async mode with LiteLLM.
Troubleshooting
- If you exceed the max_tokens or max_cost, LiteLLM raises a BudgetExceededError. Catch this exception to handle gracefully.
- Ensure your API key has sufficient quota and billing enabled.
- Check provider-specific cost calculation if max_cost seems inaccurate.
from litellm import BudgetExceededError
try:
response = client.chat(messages=[{"role": "user", "content": "Use too many tokens."}])
except BudgetExceededError:
print("Budget limit exceeded. Please reduce tokens or increase budget.") output
Budget limit exceeded. Please reduce tokens or increase budget.
Key Takeaways
- Use max_tokens and max_cost in LiteLLM client to enforce budget limits.
- Budget limits prevent unexpected high usage and control AI API spending effectively.
- Catch BudgetExceededError to handle budget overruns gracefully in your app.