Code beginner · 3 min read

How to use LiteLLM in Python

Q: How to use LiteLLM in Python

Use the litellm Python package by importing its client, initializing with your API key from os.environ, and calling the client.chat.completions.create() method with your prompt and model name.

Direct answer

Use the litellm Python package by importing its client, initializing with your API key from os.environ, and calling the client.chat.completions.create() method with your prompt and model name.

Setup

Install

bash

pip install litellm

Env vars

LITELLM_API_KEY

Imports

python

import os
from litellm import LiteLLM

Examples

inHello, LiteLLM! How are you today?

outHello! I'm LiteLLM, your lightweight AI assistant. How can I help you today?

inWrite a Python function to reverse a string.

outdef reverse_string(s): return s[::-1]

inExplain the difference between AI and machine learning.

outAI is the broader concept of machines performing tasks intelligently, while machine learning is a subset of AI focused on learning from data.

Integration steps

Install the LiteLLM Python package using pip.
Set your API key in the environment variable LITELLM_API_KEY.
Import LiteLLM and os modules in your Python script.
Initialize the LiteLLM client with the API key from os.environ.
Call the chat completions endpoint with your chosen model and messages.
Extract and use the response text from the API call.

Full code

python

import os
from litellm import LiteLLM

# Initialize client with API key from environment
client = LiteLLM(api_key=os.environ["LITELLM_API_KEY"])

# Prepare messages for chat completion
messages = [{"role": "user", "content": "Hello, LiteLLM! How are you today?"}]

# Call the chat completions endpoint
response = client.chat.completions.create(model="litellm-base", messages=messages)

# Extract and print the response content
print("LiteLLM response:", response.choices[0].message.content)

API trace

Request

json

{"model": "litellm-base", "messages": [{"role": "user", "content": "Hello, LiteLLM! How are you today?"}]}

Response

json

{"choices": [{"message": {"content": "Hello! I'm LiteLLM, your lightweight AI assistant. How can I help you today?"}}], "usage": {"prompt_tokens": 10, "completion_tokens": 15, "total_tokens": 25}}

Extractresponse.choices[0].message.content

Variants

Streaming chat completions ›

Use streaming to display partial results immediately for better user experience with long responses.

python

import os
from litellm import LiteLLM

client = LiteLLM(api_key=os.environ["LITELLM_API_KEY"])

messages = [{"role": "user", "content": "Tell me a story."}]

# Stream the response tokens
for chunk in client.chat.completions.stream(model="litellm-base", messages=messages):
    print(chunk.choices[0].delta.get("content", ""), end="", flush=True)
print()

Async usage with LiteLLM ›

Use async calls to handle multiple concurrent requests efficiently in asynchronous Python applications.

python

import os
import asyncio
from litellm import LiteLLM

async def main():
    client = LiteLLM(api_key=os.environ["LITELLM_API_KEY"])
    messages = [{"role": "user", "content": "Explain async programming."}]
    response = await client.chat.completions.acreate(model="litellm-base", messages=messages)
    print("Async response:", response.choices[0].message.content)

asyncio.run(main())

Using a smaller LiteLLM model ›

Use smaller models like "litellm-small" for faster responses and lower cost when high accuracy is not critical.

python

import os
from litellm import LiteLLM

client = LiteLLM(api_key=os.environ["LITELLM_API_KEY"])
messages = [{"role": "user", "content": "Summarize the latest news."}]
response = client.chat.completions.create(model="litellm-small", messages=messages)
print("Summary:", response.choices[0].message.content)

Performance

Latency~500ms for litellm-base non-streaming calls

Cost~$0.0015 per 500 tokens

Rate limitsDefault tier: 300 requests per minute, 20,000 tokens per minute

Use concise prompts to reduce token usage.
Limit max_tokens parameter to control output length.
Reuse context efficiently to avoid repeating information.

Approach	Latency	Cost/call	Best for
Standard call	~500ms	~$0.0015	General purpose, simple integration
Streaming	~300ms initial + streaming	~$0.0015	Long responses with better UX
Async call	~500ms	~$0.0015	Concurrent requests in async apps

✓

Quick tip

Always load your API key from environment variables and never hardcode it in your source code for security.

⚠

Common mistake

Beginners often forget to set the API key in the environment, causing authentication errors when initializing the LiteLLM client.

Verified 2026-04 · litellm-base, litellm-small

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.