How to use LiteLLM in Python
Direct answer
Use the
litellm Python package by importing its client, initializing with your API key from os.environ, and calling the client.chat.completions.create() method with your prompt and model name.Setup
Install
pip install litellm Env vars
LITELLM_API_KEY Imports
import os
from litellm import LiteLLM Examples
inHello, LiteLLM! How are you today?
outHello! I'm LiteLLM, your lightweight AI assistant. How can I help you today?
inWrite a Python function to reverse a string.
outdef reverse_string(s):
return s[::-1]
inExplain the difference between AI and machine learning.
outAI is the broader concept of machines performing tasks intelligently, while machine learning is a subset of AI focused on learning from data.
Integration steps
- Install the LiteLLM Python package using pip.
- Set your API key in the environment variable LITELLM_API_KEY.
- Import LiteLLM and os modules in your Python script.
- Initialize the LiteLLM client with the API key from os.environ.
- Call the chat completions endpoint with your chosen model and messages.
- Extract and use the response text from the API call.
Full code
import os
from litellm import LiteLLM
# Initialize client with API key from environment
client = LiteLLM(api_key=os.environ["LITELLM_API_KEY"])
# Prepare messages for chat completion
messages = [{"role": "user", "content": "Hello, LiteLLM! How are you today?"}]
# Call the chat completions endpoint
response = client.chat.completions.create(model="litellm-base", messages=messages)
# Extract and print the response content
print("LiteLLM response:", response.choices[0].message.content) API trace
Request
{"model": "litellm-base", "messages": [{"role": "user", "content": "Hello, LiteLLM! How are you today?"}]} Response
{"choices": [{"message": {"content": "Hello! I'm LiteLLM, your lightweight AI assistant. How can I help you today?"}}], "usage": {"prompt_tokens": 10, "completion_tokens": 15, "total_tokens": 25}} Extract
response.choices[0].message.contentVariants
Streaming chat completions ›
Use streaming to display partial results immediately for better user experience with long responses.
import os
from litellm import LiteLLM
client = LiteLLM(api_key=os.environ["LITELLM_API_KEY"])
messages = [{"role": "user", "content": "Tell me a story."}]
# Stream the response tokens
for chunk in client.chat.completions.stream(model="litellm-base", messages=messages):
print(chunk.choices[0].delta.get("content", ""), end="", flush=True)
print() Async usage with LiteLLM ›
Use async calls to handle multiple concurrent requests efficiently in asynchronous Python applications.
import os
import asyncio
from litellm import LiteLLM
async def main():
client = LiteLLM(api_key=os.environ["LITELLM_API_KEY"])
messages = [{"role": "user", "content": "Explain async programming."}]
response = await client.chat.completions.acreate(model="litellm-base", messages=messages)
print("Async response:", response.choices[0].message.content)
asyncio.run(main()) Using a smaller LiteLLM model ›
Use smaller models like "litellm-small" for faster responses and lower cost when high accuracy is not critical.
import os
from litellm import LiteLLM
client = LiteLLM(api_key=os.environ["LITELLM_API_KEY"])
messages = [{"role": "user", "content": "Summarize the latest news."}]
response = client.chat.completions.create(model="litellm-small", messages=messages)
print("Summary:", response.choices[0].message.content) Performance
Latency~500ms for litellm-base non-streaming calls
Cost~$0.0015 per 500 tokens
Rate limitsDefault tier: 300 requests per minute, 20,000 tokens per minute
- Use concise prompts to reduce token usage.
- Limit max_tokens parameter to control output length.
- Reuse context efficiently to avoid repeating information.
| Approach | Latency | Cost/call | Best for |
|---|---|---|---|
| Standard call | ~500ms | ~$0.0015 | General purpose, simple integration |
| Streaming | ~300ms initial + streaming | ~$0.0015 | Long responses with better UX |
| Async call | ~500ms | ~$0.0015 | Concurrent requests in async apps |
Quick tip
Always load your API key from environment variables and never hardcode it in your source code for security.
Common mistake
Beginners often forget to set the API key in the environment, causing authentication errors when initializing the LiteLLM client.