How to beginner · 3 min read

How to use Groq with LiteLLM

Q: How to use Groq with LiteLLM

Use the openai Python SDK with base_url="https://api.groq.com/openai/v1" to access Groq models. Instantiate the OpenAI client with your API key, then call chat.completions.create specifying a Groq model and messages. LiteLLM can wrap this client for lightweight local or remote inference.

Quick answer

Use the openai Python SDK with base_url="https://api.groq.com/openai/v1" to access Groq models. Instantiate the OpenAI client with your API key, then call chat.completions.create specifying a Groq model and messages. LiteLLM can wrap this client for lightweight local or remote inference.

PREREQUISITES

Python 3.8+
Groq API key
pip install openai>=1.0
LiteLLM installed (pip install litellm)

Setup

Install the openai SDK and litellm Python package. Set your Groq API key as an environment variable for secure authentication.

bash

pip install openai litellm

Step by step

This example shows how to create a Groq client using the OpenAI SDK with the Groq base URL, then wrap it with LiteLLM for inference.

python

import os
from openai import OpenAI
from litellm import LiteLLM

# Initialize Groq client with OpenAI-compatible SDK
client = OpenAI(api_key=os.environ["GROQ_API_KEY"], base_url="https://api.groq.com/openai/v1")

# Wrap the client with LiteLLM
llm = LiteLLM(client=client, model_name="llama-3.3-70b-versatile")

# Prepare chat messages
messages = [{"role": "user", "content": "Explain the benefits of using Groq with LiteLLM."}]

# Generate completion
response = llm.chat_completions(messages=messages)
print(response.choices[0].message.content)

output

Groq's hardware acceleration combined with LiteLLM's lightweight interface enables fast, efficient inference with large language models, reducing latency and resource usage.

Common variations

Use different Groq models by changing model_name in LiteLLM, e.g., llama-3.1-8b-instant.
Call the OpenAI client directly without LiteLLM for more control.
Use async calls if your environment supports it by using await with async methods.

python

import asyncio

async def async_example():
    response = await client.chat.completions.acreate(
        model="llama-3.3-70b-versatile",
        messages=[{"role": "user", "content": "Hello asynchronously"}]
    )
    print(response.choices[0].message.content)

asyncio.run(async_example())

output

Hello asynchronously

Troubleshooting

If you get authentication errors, verify your GROQ_API_KEY environment variable is set correctly.
For connection issues, ensure your network allows access to https://api.groq.com.
If the model name is invalid, check the latest Groq model list at Groq docs.

Key Takeaways

Use the OpenAI SDK with Groq's base_url to access Groq models programmatically.
LiteLLM can wrap the OpenAI client for simplified local or remote inference.
Always set your Groq API key in the environment variable GROQ_API_KEY for authentication.

Verified 2026-04 · llama-3.3-70b-versatile, llama-3.1-8b-instant

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.