How to beginner · 3 min read

How to use async completion with LiteLLM

Q: How to use async completion with LiteLLM

Use the LiteLLM Python client with async methods by importing asyncio and calling await client.completions.acreate() inside an async function. This enables non-blocking, concurrent AI completions with LiteLLM.

Quick answer

Use the LiteLLM Python client with async methods by importing asyncio and calling await client.completions.acreate() inside an async function. This enables non-blocking, concurrent AI completions with LiteLLM.

PREREQUISITES

Python 3.8+
pip install litellm
Basic knowledge of async/await in Python

Setup

Install the litellm package and ensure you have Python 3.8 or newer. No API key is required for local LiteLLM usage.

bash

pip install litellm

Step by step

Use Python's asyncio to run async completions with LiteLLM. Instantiate the client, then call acreate() on client.completions inside an async function.

python

import asyncio
from litellm import LiteLLM

async def main():
    client = LiteLLM()
    response = await client.completions.acreate(
        model="litellm-small",
        prompt="Write a short poem about AI.",
        max_tokens=50
    )
    print(response.choices[0].message.content)

asyncio.run(main())

output

AI whispers softly,
In circuits and in code,
Dreams of silicon,
In endless data flow.

Common variations

Use different models by changing the model parameter (e.g., litellm-medium).
Adjust max_tokens and other parameters for output length and style.
Combine async completions with streaming by using client.completions.astream() for token-by-token output.

python

import asyncio
from litellm import LiteLLM

async def stream_example():
    client = LiteLLM()
    async for chunk in client.completions.astream(
        model="litellm-small",
        prompt="Explain async in Python.",
        max_tokens=30
    ):
        print(chunk.choices[0].delta.get('content', ''), end='', flush=True)

asyncio.run(stream_example())

output

Async in Python allows concurrent execution by using the async and await keywords, enabling efficient I/O-bound operations.

Troubleshooting

If you get RuntimeError: This event loop is already running, use nest_asyncio or run your async code in a separate script.
Ensure your Python version supports asyncio.run() (Python 3.7+).
If acreate() is not found, verify you have the latest litellm version installed.

✅

Key Takeaways

Use await client.completions.acreate() for async completions with LiteLLM.
Run async code inside an async function with asyncio.run().
Streaming completions are available via astream() for token-wise output.

Verified 2026-04 · litellm-small, litellm-medium

Verify ↗