How to beginner · 3 min read

How to use LiteLLM proxy with OpenAI SDK

Q: How to use LiteLLM proxy with OpenAI SDK

Use the OpenAI SDK with the base_url parameter set to your LiteLLM proxy endpoint to route requests through LiteLLM. This enables you to send chat completions to LiteLLM transparently using the standard client.chat.completions.create() method.

Quick answer

Use the OpenAI SDK with the base_url parameter set to your LiteLLM proxy endpoint to route requests through LiteLLM. This enables you to send chat completions to LiteLLM transparently using the standard client.chat.completions.create() method.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0
LiteLLM proxy running and accessible

Setup

Install the official OpenAI Python SDK and ensure your LiteLLM proxy server is running and reachable. Set your OpenAI API key as an environment variable.

Install OpenAI SDK: pip install openai
Export your API key: export OPENAI_API_KEY='your_api_key'
Have LiteLLM proxy URL ready, e.g., http://localhost:11434/v1

bash

pip install openai

Step by step

Use the OpenAI client with the base_url parameter pointed to your LiteLLM proxy URL. Then call the chat completions endpoint as usual.

python

import os
from openai import OpenAI

# Initialize OpenAI client with LiteLLM proxy base URL
client = OpenAI(
    api_key=os.environ["OPENAI_API_KEY"],
    base_url="http://localhost:11434/v1"
)

# Create a chat completion request
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello from LiteLLM proxy!"}]
)

print(response.choices[0].message.content)

output

Hello from LiteLLM proxy! How can I assist you today?

Common variations

You can use different models supported by LiteLLM by changing the model parameter. For asynchronous calls, use Python's asyncio with the OpenAI SDK's async client. Streaming responses are also supported by passing stream=True in the request.

python

import os
import asyncio
from openai import OpenAI

async def async_chat():
    client = OpenAI(
        api_key=os.environ["OPENAI_API_KEY"],
        base_url="http://localhost:11434/v1"
    )

    # Async chat completion with streaming
    response = await client.chat.completions.acreate(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Stream response via LiteLLM proxy."}],
        stream=True
    )

    async for chunk in response:
        print(chunk.choices[0].delta.get("content", ""), end="", flush=True)

asyncio.run(async_chat())

output

Stream response via LiteLLM proxy.

Troubleshooting

If you get connection errors, verify the LiteLLM proxy URL and that the proxy server is running.
Ensure your OPENAI_API_KEY environment variable is set correctly even though LiteLLM may not require it; the SDK mandates it.
Check that the LiteLLM proxy supports the model you specify.
For SSL errors, confirm if your proxy uses HTTPS and adjust base_url accordingly.

✅

Key Takeaways

Set the OpenAI SDK's base_url to your LiteLLM proxy endpoint to route requests.
Use the standard client.chat.completions.create() method with LiteLLM transparently.
Support for async and streaming calls works by using the SDK's async methods with the proxy.
Always verify proxy URL and API key environment variables to avoid connection issues.

Verified 2026-04 · gpt-4o

Verify ↗