How to beginner · 3 min read

How to use Groq with LiteLLM

Quick answer
Use the openai Python SDK with base_url="https://api.groq.com/openai/v1" to access Groq models. Instantiate the OpenAI client with your API key, then call chat.completions.create specifying a Groq model and messages. LiteLLM can wrap this client for lightweight local or remote inference.

PREREQUISITES

  • Python 3.8+
  • Groq API key
  • pip install openai>=1.0
  • LiteLLM installed (pip install litellm)

Setup

Install the openai SDK and litellm Python package. Set your Groq API key as an environment variable for secure authentication.

bash
pip install openai litellm

Step by step

This example shows how to create a Groq client using the OpenAI SDK with the Groq base URL, then wrap it with LiteLLM for inference.

python
import os
from openai import OpenAI
from litellm import LiteLLM

# Initialize Groq client with OpenAI-compatible SDK
client = OpenAI(api_key=os.environ["GROQ_API_KEY"], base_url="https://api.groq.com/openai/v1")

# Wrap the client with LiteLLM
llm = LiteLLM(client=client, model_name="llama-3.3-70b-versatile")

# Prepare chat messages
messages = [{"role": "user", "content": "Explain the benefits of using Groq with LiteLLM."}]

# Generate completion
response = llm.chat_completions(messages=messages)
print(response.choices[0].message.content)
output
Groq's hardware acceleration combined with LiteLLM's lightweight interface enables fast, efficient inference with large language models, reducing latency and resource usage.

Common variations

  • Use different Groq models by changing model_name in LiteLLM, e.g., llama-3.1-8b-instant.
  • Call the OpenAI client directly without LiteLLM for more control.
  • Use async calls if your environment supports it by using await with async methods.
python
import asyncio

async def async_example():
    response = await client.chat.completions.acreate(
        model="llama-3.3-70b-versatile",
        messages=[{"role": "user", "content": "Hello asynchronously"}]
    )
    print(response.choices[0].message.content)

asyncio.run(async_example())
output
Hello asynchronously

Troubleshooting

  • If you get authentication errors, verify your GROQ_API_KEY environment variable is set correctly.
  • For connection issues, ensure your network allows access to https://api.groq.com.
  • If the model name is invalid, check the latest Groq model list at Groq docs.

Key Takeaways

  • Use the OpenAI SDK with Groq's base_url to access Groq models programmatically.
  • LiteLLM can wrap the OpenAI client for simplified local or remote inference.
  • Always set your Groq API key in the environment variable GROQ_API_KEY for authentication.
Verified 2026-04 · llama-3.3-70b-versatile, llama-3.1-8b-instant
Verify ↗