How to use Groq API in Python
Direct answer
Use the openai Python SDK with base_url="https://api.groq.com/openai/v1" and your GROQ_API_KEY to call client.chat.completions.create() with the desired Groq model and messages.
Setup
Install
pip install openai Env vars
GROQ_API_KEY Imports
from openai import OpenAI
import os Examples
inHello, how are you?
outHi! I'm Groq's AI model, ready to assist you.
inWrite a Python function to reverse a string.
outHere's a Python function to reverse a string:
def reverse_string(s):
return s[::-1]
inExplain quantum computing in simple terms.
outQuantum computing uses quantum bits that can be both 0 and 1 simultaneously, enabling faster problem solving for certain tasks.
Integration steps
- Install the OpenAI Python SDK and set the GROQ_API_KEY environment variable.
- Import OpenAI and initialize the client with your API key and Groq base URL.
- Prepare the chat messages array with roles and content.
- Call client.chat.completions.create() with the Groq model and messages.
- Extract the response text from response.choices[0].message.content.
- Use or display the generated text as needed.
Full code
from openai import OpenAI
import os
def main():
client = OpenAI(api_key=os.environ["GROQ_API_KEY"], base_url="https://api.groq.com/openai/v1")
messages = [{"role": "user", "content": "Hello, how are you?"}]
response = client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=messages
)
print("Response:", response.choices[0].message.content)
if __name__ == "__main__":
main() output
Response: Hi! I'm Groq's AI model, ready to assist you.
API trace
Request
{"model": "llama-3.3-70b-versatile", "messages": [{"role": "user", "content": "Hello, how are you?"}]} Response
{"choices": [{"message": {"content": "Hi! I'm Groq's AI model, ready to assist you."}}], "usage": {"prompt_tokens": 10, "completion_tokens": 12, "total_tokens": 22}} Extract
response.choices[0].message.contentVariants
Streaming response ›
Use streaming to display partial results as they arrive for better user experience with long outputs.
from openai import OpenAI
import os
def main():
client = OpenAI(api_key=os.environ["GROQ_API_KEY"], base_url="https://api.groq.com/openai/v1")
messages = [{"role": "user", "content": "Tell me a story."}]
stream = client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=messages,
stream=True
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="", flush=True)
if __name__ == "__main__":
main() Async version ›
Use async calls when integrating Groq API in asynchronous Python applications for concurrency.
import asyncio
from openai import OpenAI
import os
async def main():
client = OpenAI(api_key=os.environ["GROQ_API_KEY"], base_url="https://api.groq.com/openai/v1")
messages = [{"role": "user", "content": "Explain AI."}]
response = await client.chat.completions.acreate(
model="llama-3.3-70b-versatile",
messages=messages
)
print("Response:", response.choices[0].message.content)
if __name__ == "__main__":
asyncio.run(main()) Alternative model ›
Use smaller or specialized Groq models like mixtral-8x7b-32768 for faster responses or cost savings.
from openai import OpenAI
import os
def main():
client = OpenAI(api_key=os.environ["GROQ_API_KEY"], base_url="https://api.groq.com/openai/v1")
messages = [{"role": "user", "content": "Summarize the latest tech news."}]
response = client.chat.completions.create(
model="mixtral-8x7b-32768",
messages=messages
)
print("Summary:", response.choices[0].message.content)
if __name__ == "__main__":
main() Performance
Latency~700ms for llama-3.3-70b-versatile non-streaming calls
Cost~$0.003 per 500 tokens exchanged
Rate limitsTier 1: 600 RPM / 36,000 TPM
- Use concise prompts to reduce token usage.
- Limit max_tokens in completions to control output length.
- Reuse context efficiently by summarizing prior conversation.
| Approach | Latency | Cost/call | Best for |
|---|---|---|---|
| Standard call | ~700ms | ~$0.003 | General purpose chat completions |
| Streaming call | ~700ms initial + incremental | ~$0.003 | Long responses with better UX |
| Async call | ~700ms | ~$0.003 | Concurrent or event-driven apps |
| Smaller model (mixtral-8x7b-32768) | ~400ms | ~$0.0015 | Faster, cost-effective tasks |
Quick tip
Always specify the Groq base_url when initializing the OpenAI client to ensure requests route to Groq's API endpoint.
Common mistake
Forgetting to set the base_url to Groq's endpoint causes requests to default to OpenAI's API and fail authentication.