How to beginner · 3 min read

How to call OpenAI with LiteLLM

Quick answer
Use LiteLLM to serve local models and call OpenAI API by configuring the OpenAI SDK with LiteLLM's local server URL. Instantiate the OpenAI client with base_url pointing to LiteLLM's endpoint and use client.chat.completions.create() to send chat requests.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0
  • LiteLLM installed and running locally

Setup

Install the official OpenAI Python SDK and ensure LiteLLM is installed and running locally. Set your OpenAI API key as an environment variable.

  • Install OpenAI SDK: pip install openai
  • Run LiteLLM server locally (default port 11434)
  • Export your API key: export OPENAI_API_KEY='your_api_key'
bash
pip install openai

Step by step

Use the OpenAI Python SDK to call LiteLLM by setting the base_url to LiteLLM's local server. This example sends a chat completion request to a local llama3.2 model served by LiteLLM.

python
import os
from openai import OpenAI

# Initialize OpenAI client with LiteLLM local server URL
client = OpenAI(
    api_key=os.environ["OPENAI_API_KEY"],
    base_url="http://localhost:11434/v1"
)

# Prepare chat messages
messages = [
    {"role": "user", "content": "Hello from LiteLLM!"}
]

# Call chat completion on local llama3.2 model
response = client.chat.completions.create(
    model="llama3.2",
    messages=messages
)

print(response.choices[0].message.content)
output
Hello from LiteLLM! How can I assist you today?

Common variations

You can switch models by changing the model parameter to any model LiteLLM supports locally, such as llama3.3-70b. For async calls, use Python's asyncio with the OpenAI SDK's async client. Streaming is not supported by LiteLLM's HTTP interface.

python
import asyncio
from openai import OpenAI

async def async_chat():
    client = OpenAI(
        api_key=os.environ["OPENAI_API_KEY"],
        base_url="http://localhost:11434/v1"
    )
    response = await client.chat.completions.acreate(
        model="llama3.3-70b",
        messages=[{"role": "user", "content": "Async call with LiteLLM"}]
    )
    print(response.choices[0].message.content)

asyncio.run(async_chat())
output
Async call with LiteLLM received. How can I help?

Troubleshooting

  • If you get connection errors, verify LiteLLM server is running on localhost:11434.
  • Ensure your OPENAI_API_KEY environment variable is set correctly.
  • Check that the model name matches one served by LiteLLM.
  • For timeout issues, increase client timeout or check server load.

Key Takeaways

  • Use OpenAI SDK with base_url set to LiteLLM's local server to call local models.
  • Set model parameter to the LiteLLM-served model name like llama3.2.
  • Async calls are supported via OpenAI SDK's async methods with LiteLLM.
  • Ensure LiteLLM server is running and accessible on the expected port.
  • Always use environment variables for API keys to keep credentials secure.
Verified 2026-04 · llama3.2, llama3.3-70b
Verify ↗