How to beginner · 3 min read

How to call OpenAI with LiteLLM

Quick answer

Use LiteLLM to serve local models and call OpenAI API by configuring the OpenAI SDK with LiteLLM's local server URL. Instantiate the OpenAI client with base_url pointing to LiteLLM's endpoint and use client.chat.completions.create() to send chat requests.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0
LiteLLM installed and running locally

Setup

Install the official OpenAI Python SDK and ensure LiteLLM is installed and running locally. Set your OpenAI API key as an environment variable.

Install OpenAI SDK: pip install openai
Run LiteLLM server locally (default port 11434)
Export your API key: export OPENAI_API_KEY='your_api_key'

bash

pip install openai

Step by step

Use the OpenAI Python SDK to call LiteLLM by setting the base_url to LiteLLM's local server. This example sends a chat completion request to a local llama3.2 model served by LiteLLM.

python

import os
from openai import OpenAI

# Initialize OpenAI client with LiteLLM local server URL
client = OpenAI(
    api_key=os.environ["OPENAI_API_KEY"],
    base_url="http://localhost:11434/v1"
)

# Prepare chat messages
messages = [
    {"role": "user", "content": "Hello from LiteLLM!"}
]

# Call chat completion on local llama3.2 model
response = client.chat.completions.create(
    model="llama3.2",
    messages=messages
)

print(response.choices[0].message.content)

output

Hello from LiteLLM! How can I assist you today?

Common variations

You can switch models by changing the model parameter to any model LiteLLM supports locally, such as llama3.3-70b. For async calls, use Python's asyncio with the OpenAI SDK's async client. Streaming is not supported by LiteLLM's HTTP interface.

python

import asyncio
from openai import OpenAI

async def async_chat():
    client = OpenAI(
        api_key=os.environ["OPENAI_API_KEY"],
        base_url="http://localhost:11434/v1"
    )
    response = await client.chat.completions.acreate(
        model="llama3.3-70b",
        messages=[{"role": "user", "content": "Async call with LiteLLM"}]
    )
    print(response.choices[0].message.content)

asyncio.run(async_chat())

output

Async call with LiteLLM received. How can I help?

Troubleshooting

If you get connection errors, verify LiteLLM server is running on localhost:11434.
Ensure your OPENAI_API_KEY environment variable is set correctly.
Check that the model name matches one served by LiteLLM.
For timeout issues, increase client timeout or check server load.

✅

Key Takeaways

Use OpenAI SDK with base_url set to LiteLLM's local server to call local models.
Set model parameter to the LiteLLM-served model name like llama3.2.
Async calls are supported via OpenAI SDK's async methods with LiteLLM.
Ensure LiteLLM server is running and accessible on the expected port.
Always use environment variables for API keys to keep credentials secure.

Verified 2026-04 · llama3.2, llama3.3-70b

Verify ↗