How to beginner · 3 min read

How to use LiteLLM with OpenAI Assistants API

Q: How to use LiteLLM with OpenAI Assistants API

Use the OpenAI Python SDK to call the OpenAI Assistants API with the lite-llm model by specifying it in the model parameter of client.chat.completions.create(). Provide your messages as usual, and the API will route requests to LiteLLM for efficient assistant responses.

Quick answer

Use the OpenAI Python SDK to call the OpenAI Assistants API with the lite-llm model by specifying it in the model parameter of client.chat.completions.create(). Provide your messages as usual, and the API will route requests to LiteLLM for efficient assistant responses.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the official OpenAI Python SDK and set your API key as an environment variable for secure authentication.

bash

pip install openai>=1.0

Step by step

This example demonstrates a complete Python script to send a chat completion request to the OpenAI Assistants API using the lite-llm model. It prints the assistant's reply.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

response = client.chat.completions.create(
    model="lite-llm",
    messages=[
        {"role": "user", "content": "Hello, LiteLLM! How do you work with OpenAI Assistants API?"}
    ]
)

print("Assistant reply:", response.choices[0].message.content)

output

Assistant reply: Hello! I am LiteLLM, optimized for fast and efficient assistant responses via the OpenAI Assistants API.

Common variations

Use different models like gpt-4o or gpt-4o-mini by changing the model parameter.
Implement async calls with asyncio and await if your application requires concurrency.
Stream responses by setting stream=True in the request to receive tokens incrementally.

python

import os
import asyncio
from openai import OpenAI

async def async_chat():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    response = await client.chat.completions.acreate(
        model="lite-llm",
        messages=[{"role": "user", "content": "Async call with LiteLLM."}],
        stream=True
    )
    async for chunk in response:
        print(chunk.choices[0].delta.get("content", ""), end="", flush=True)

asyncio.run(async_chat())

output

Async call with LiteLLM.

Troubleshooting

If you get authentication errors, verify your OPENAI_API_KEY environment variable is set correctly.
For model not found errors, confirm lite-llm is available in your account or check for typos.
Timeouts may require increasing your network timeout or retrying the request.

✅

Key Takeaways

Use the OpenAI Python SDK with model="lite-llm" to access LiteLLM via the Assistants API.
Set your API key securely in os.environ["OPENAI_API_KEY"] before running code.
Async and streaming calls improve responsiveness for interactive applications.
Check model availability and API key correctness to avoid common errors.

Verified 2026-04 · lite-llm, gpt-4o, gpt-4o-mini

Verify ↗