How to beginner · 3 min read

How to use Gemma on Groq

Q: How to use Gemma on Groq

Use the OpenAI Python SDK with the base_url set to Groq's API endpoint and specify model="gemma2-9b-it" to call the Gemma model. Send chat messages via client.chat.completions.create() to get responses from Gemma on Groq.

Quick answer

Use the OpenAI Python SDK with the base_url set to Groq's API endpoint and specify model="gemma2-9b-it" to call the Gemma model. Send chat messages via client.chat.completions.create() to get responses from Gemma on Groq.

PREREQUISITES

Python 3.8+
Groq API key (set as environment variable GROQ_API_KEY)
pip install openai>=1.0

Setup

Install the official openai Python package and set your Groq API key as an environment variable. Use the Groq OpenAI-compatible endpoint https://api.groq.com/openai/v1 when creating the client.

bash

pip install openai>=1.0

Step by step

Use the OpenAI client with base_url set to Groq's endpoint and specify the gemma2-9b-it model. Send a chat completion request with your messages to get a response from Gemma.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["GROQ_API_KEY"], base_url="https://api.groq.com/openai/v1")

response = client.chat.completions.create(
    model="gemma2-9b-it",
    messages=[{"role": "user", "content": "Explain the benefits of AI in healthcare."}]
)

print(response.choices[0].message.content)

output

AI in healthcare improves diagnostics, personalizes treatment, enhances patient monitoring, and accelerates drug discovery.

Common variations

Use streaming by adding stream=True to chat.completions.create() and iterating over chunks.
Switch to other Groq models like llama-3.3-70b-versatile by changing the model parameter.
Use async calls with an async client if your environment supports it.

python

import asyncio
from openai import OpenAI

async def main():
    client = OpenAI(api_key=os.environ["GROQ_API_KEY"], base_url="https://api.groq.com/openai/v1")
    
    stream = client.chat.completions.create(
        model="gemma2-9b-it",
        messages=[{"role": "user", "content": "Summarize the latest AI trends."}],
        stream=True
    )

    async for chunk in stream:
        print(chunk.choices[0].delta.content or "", end="", flush=True)

asyncio.run(main())

output

AI trends include generative models, multimodal AI, foundation models, and increased AI adoption across industries.

Troubleshooting

If you get authentication errors, verify your GROQ_API_KEY environment variable is set correctly.
For network errors, check your internet connection and Groq API status.
If the model is not found, confirm you are using a valid Groq model name like gemma2-9b-it.

✅

Key Takeaways

Use the OpenAI Python SDK with Groq's base_url to access Gemma models.
Specify the model name exactly as "gemma2-9b-it" for Gemma on Groq.
Streaming and async calls are supported for efficient response handling.

Verified 2026-04 · gemma2-9b-it, llama-3.3-70b-versatile

Verify ↗