How to call OpenAI with LiteLLM
Quick answer
Use LiteLLM to serve local models and call OpenAI API by configuring the OpenAI SDK with LiteLLM's local server URL. Instantiate the OpenAI client with
base_url pointing to LiteLLM's endpoint and use client.chat.completions.create() to send chat requests.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0LiteLLM installed and running locally
Setup
Install the official OpenAI Python SDK and ensure LiteLLM is installed and running locally. Set your OpenAI API key as an environment variable.
- Install OpenAI SDK:
pip install openai - Run LiteLLM server locally (default port 11434)
- Export your API key:
export OPENAI_API_KEY='your_api_key'
pip install openai Step by step
Use the OpenAI Python SDK to call LiteLLM by setting the base_url to LiteLLM's local server. This example sends a chat completion request to a local llama3.2 model served by LiteLLM.
import os
from openai import OpenAI
# Initialize OpenAI client with LiteLLM local server URL
client = OpenAI(
api_key=os.environ["OPENAI_API_KEY"],
base_url="http://localhost:11434/v1"
)
# Prepare chat messages
messages = [
{"role": "user", "content": "Hello from LiteLLM!"}
]
# Call chat completion on local llama3.2 model
response = client.chat.completions.create(
model="llama3.2",
messages=messages
)
print(response.choices[0].message.content) output
Hello from LiteLLM! How can I assist you today?
Common variations
You can switch models by changing the model parameter to any model LiteLLM supports locally, such as llama3.3-70b. For async calls, use Python's asyncio with the OpenAI SDK's async client. Streaming is not supported by LiteLLM's HTTP interface.
import asyncio
from openai import OpenAI
async def async_chat():
client = OpenAI(
api_key=os.environ["OPENAI_API_KEY"],
base_url="http://localhost:11434/v1"
)
response = await client.chat.completions.acreate(
model="llama3.3-70b",
messages=[{"role": "user", "content": "Async call with LiteLLM"}]
)
print(response.choices[0].message.content)
asyncio.run(async_chat()) output
Async call with LiteLLM received. How can I help?
Troubleshooting
- If you get connection errors, verify LiteLLM server is running on
localhost:11434. - Ensure your
OPENAI_API_KEYenvironment variable is set correctly. - Check that the model name matches one served by LiteLLM.
- For timeout issues, increase client timeout or check server load.
Key Takeaways
- Use OpenAI SDK with
base_urlset to LiteLLM's local server to call local models. - Set
modelparameter to the LiteLLM-served model name likellama3.2. - Async calls are supported via OpenAI SDK's async methods with LiteLLM.
- Ensure LiteLLM server is running and accessible on the expected port.
- Always use environment variables for API keys to keep credentials secure.