How to beginner · 3 min read

How to use Llama via Groq API

Q: How to use Llama via Groq API

Use the openai Python SDK with the base_url set to Groq's API endpoint and your Groq API key from os.environ. Call client.chat.completions.create with the model llama-3.3-70b-versatile and your chat messages to interact with Llama via Groq.

Quick answer

Use the openai Python SDK with the base_url set to Groq's API endpoint and your Groq API key from os.environ. Call client.chat.completions.create with the model llama-3.3-70b-versatile and your chat messages to interact with Llama via Groq.

PREREQUISITES

Python 3.8+
Groq API key (set in environment variable GROQ_API_KEY)
pip install openai>=1.0

Setup

Install the openai Python package and set your Groq API key as an environment variable GROQ_API_KEY. The Groq API is OpenAI-compatible, so you use the OpenAI client with a custom base_url.

bash

pip install openai>=1.0

Step by step

Use the following Python code to call the Llama model llama-3.3-70b-versatile via Groq API. It creates a chat completion with a user message and prints the response.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["GROQ_API_KEY"], base_url="https://api.groq.com/openai/v1")

response = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "Hello, how do I use Llama via Groq API?"}]
)

print(response.choices[0].message.content)

output

Hello! To use Llama via the Groq API, you send chat completion requests to the Groq endpoint with the model "llama-3.3-70b-versatile" using the OpenAI-compatible SDK.

Common variations

Use other Llama models like llama-3.1-8b-instant by changing the model parameter.
For streaming responses, use stream=True in chat.completions.create and iterate over the response.
Use async calls with an async OpenAI client wrapper if needed.

python

import os
import asyncio
from openai import OpenAI

async def async_chat():
    client = OpenAI(api_key=os.environ["GROQ_API_KEY"], base_url="https://api.groq.com/openai/v1")
    response = await client.chat.completions.acreate(
        model="llama-3.3-70b-versatile",
        messages=[{"role": "user", "content": "Streamed response example."}],
        stream=True
    )
    async for chunk in response:
        print(chunk.choices[0].delta.get("content", ""), end="")

asyncio.run(async_chat())

output

Streamed response example output printed chunk by chunk.

Troubleshooting

If you get authentication errors, verify your GROQ_API_KEY environment variable is set correctly.
If the model is not found, confirm you are using a valid Groq Llama model name like llama-3.3-70b-versatile.
For network errors, check your internet connection and that https://api.groq.com/openai/v1 is reachable.

✅

Key Takeaways

Use the OpenAI Python SDK with Groq's base_url to access Llama models via Groq API.
Set your Groq API key in the environment variable GROQ_API_KEY for authentication.
Model names like llama-3.3-70b-versatile are current and supported by Groq.
Streaming and async calls are supported with the OpenAI-compatible SDK.
Check environment variables and model names if you encounter errors.

Verified 2026-04 · llama-3.3-70b-versatile, llama-3.1-8b-instant

Verify ↗