Code beginner · 3 min read

How to use LiteLLM in Python

Direct answer
Use the litellm Python package by importing its client, initializing with your API key from os.environ, and calling the client.chat.completions.create() method with your prompt and model name.

Setup

Install
bash
pip install litellm
Env vars
LITELLM_API_KEY
Imports
python
import os
from litellm import LiteLLM

Examples

inHello, LiteLLM! How are you today?
outHello! I'm LiteLLM, your lightweight AI assistant. How can I help you today?
inWrite a Python function to reverse a string.
outdef reverse_string(s): return s[::-1]
inExplain the difference between AI and machine learning.
outAI is the broader concept of machines performing tasks intelligently, while machine learning is a subset of AI focused on learning from data.

Integration steps

  1. Install the LiteLLM Python package using pip.
  2. Set your API key in the environment variable LITELLM_API_KEY.
  3. Import LiteLLM and os modules in your Python script.
  4. Initialize the LiteLLM client with the API key from os.environ.
  5. Call the chat completions endpoint with your chosen model and messages.
  6. Extract and use the response text from the API call.

Full code

python
import os
from litellm import LiteLLM

# Initialize client with API key from environment
client = LiteLLM(api_key=os.environ["LITELLM_API_KEY"])

# Prepare messages for chat completion
messages = [{"role": "user", "content": "Hello, LiteLLM! How are you today?"}]

# Call the chat completions endpoint
response = client.chat.completions.create(model="litellm-base", messages=messages)

# Extract and print the response content
print("LiteLLM response:", response.choices[0].message.content)

API trace

Request
json
{"model": "litellm-base", "messages": [{"role": "user", "content": "Hello, LiteLLM! How are you today?"}]}
Response
json
{"choices": [{"message": {"content": "Hello! I'm LiteLLM, your lightweight AI assistant. How can I help you today?"}}], "usage": {"prompt_tokens": 10, "completion_tokens": 15, "total_tokens": 25}}
Extractresponse.choices[0].message.content

Variants

Streaming chat completions

Use streaming to display partial results immediately for better user experience with long responses.

python
import os
from litellm import LiteLLM

client = LiteLLM(api_key=os.environ["LITELLM_API_KEY"])

messages = [{"role": "user", "content": "Tell me a story."}]

# Stream the response tokens
for chunk in client.chat.completions.stream(model="litellm-base", messages=messages):
    print(chunk.choices[0].delta.get("content", ""), end="", flush=True)
print()
Async usage with LiteLLM

Use async calls to handle multiple concurrent requests efficiently in asynchronous Python applications.

python
import os
import asyncio
from litellm import LiteLLM

async def main():
    client = LiteLLM(api_key=os.environ["LITELLM_API_KEY"])
    messages = [{"role": "user", "content": "Explain async programming."}]
    response = await client.chat.completions.acreate(model="litellm-base", messages=messages)
    print("Async response:", response.choices[0].message.content)

asyncio.run(main())
Using a smaller LiteLLM model

Use smaller models like "litellm-small" for faster responses and lower cost when high accuracy is not critical.

python
import os
from litellm import LiteLLM

client = LiteLLM(api_key=os.environ["LITELLM_API_KEY"])
messages = [{"role": "user", "content": "Summarize the latest news."}]
response = client.chat.completions.create(model="litellm-small", messages=messages)
print("Summary:", response.choices[0].message.content)

Performance

Latency~500ms for litellm-base non-streaming calls
Cost~$0.0015 per 500 tokens
Rate limitsDefault tier: 300 requests per minute, 20,000 tokens per minute
  • Use concise prompts to reduce token usage.
  • Limit max_tokens parameter to control output length.
  • Reuse context efficiently to avoid repeating information.
ApproachLatencyCost/callBest for
Standard call~500ms~$0.0015General purpose, simple integration
Streaming~300ms initial + streaming~$0.0015Long responses with better UX
Async call~500ms~$0.0015Concurrent requests in async apps

Quick tip

Always load your API key from environment variables and never hardcode it in your source code for security.

Common mistake

Beginners often forget to set the API key in the environment, causing authentication errors when initializing the LiteLLM client.

Verified 2026-04 · litellm-base, litellm-small
Verify ↗