How to beginner · 3 min read

How to set budget limits in LiteLLM

Quick answer

In LiteLLM, you set budget limits by configuring the max_tokens and max_cost parameters in the client or request settings to cap usage. This ensures your API calls do not exceed your desired token or cost budget during inference.

PREREQUISITES

Python 3.8+
LiteLLM installed (pip install litellm)
API key for your AI provider (e.g., OpenAI or Anthropic)
Basic knowledge of Python async or sync programming

Setup

Install LiteLLM via pip and set your API key as an environment variable. This example assumes you use OpenAI as the backend.

bash

pip install litellm

# Set your API key in the environment
# export OPENAI_API_KEY=os.environ["OPENAI_API_KEY"]  # Linux/macOS
# setx OPENAI_API_KEY os.environ["OPENAI_API_KEY"]    # Windows PowerShell

Step by step

Use LiteLLM client configuration to set budget limits by specifying max_tokens and max_cost parameters. The example below shows how to limit token usage and cost per request.

python

import os
from litellm import LiteLLM

# Initialize LiteLLM client with budget limits
client = LiteLLM(
    api_key=os.environ["OPENAI_API_KEY"],
    provider="openai",
    model="gpt-4o",
    max_tokens=500,       # Limit tokens per request
    max_cost=0.01         # Limit cost per request in USD
)

# Make a request within budget
response = client.chat(
    messages=[{"role": "user", "content": "Explain budget limits in LiteLLM."}]
)

print("Response:", response.text)

output

Response: LiteLLM lets you set max_tokens and max_cost to control usage and spending per request.

Common variations

You can also set budget limits globally or per session, and use async calls with the same parameters. Different providers may support different cost calculation methods.

python

import asyncio
from litellm import LiteLLM

async def main():
    client = LiteLLM(
        api_key=os.environ["OPENAI_API_KEY"],
        provider="openai",
        model="gpt-4o",
        max_tokens=300,
        max_cost=0.005
    )

    response = await client.chat_async(
        messages=[{"role": "user", "content": "Async budget limit example."}]
    )
    print("Async response:", response.text)

asyncio.run(main())

output

Async response: Budget limits work the same in async mode with LiteLLM.

Troubleshooting

If you exceed the max_tokens or max_cost, LiteLLM raises a BudgetExceededError. Catch this exception to handle gracefully.
Ensure your API key has sufficient quota and billing enabled.
Check provider-specific cost calculation if max_cost seems inaccurate.

python

from litellm import BudgetExceededError

try:
    response = client.chat(messages=[{"role": "user", "content": "Use too many tokens."}])
except BudgetExceededError:
    print("Budget limit exceeded. Please reduce tokens or increase budget.")

output

Budget limit exceeded. Please reduce tokens or increase budget.

✅

Key Takeaways

Use max_tokens and max_cost in LiteLLM client to enforce budget limits.
Budget limits prevent unexpected high usage and control AI API spending effectively.
Catch BudgetExceededError to handle budget overruns gracefully in your app.

Verified 2026-04 · gpt-4o

Verify ↗