Code Beginner easy · 4 min

ChatGroq: fastest inference, same LCEL interface

What you will learn

ChatGroq gives you sub-second LLM inference using Groq's hardware without changing your LCEL chain syntax.

Why this matters

Speed matters in production: Groq's token processing can be 10x faster than cloud APIs, and langchain's unified interface means you can swap providers without rewriting your application logic.

Skip if: Don't use ChatGroq if you need access to the latest frontier models (GPT-4o, Claude 3.5 Sonnet): Groq's model selection lags behind OpenAI and Anthropic. Also avoid if your workload is latency-insensitive and you want maximum model diversity.

Explanation

ChatGroq is a langchain LLM wrapper around Groq's inference API, which runs models on custom silicon designed for speed. It implements the exact same BaseLanguageModel interface as ChatOpenAI and ChatAnthropic, so you write one LCEL chain and swap providers by changing a single import and one line.

Mechanically, ChatGroq sends prompts to Groq's cloud API (not self-hosted) and deserializes the response back into langchain's message format. Under the hood, it inherits streaming, token counting, and structured output methods from langchain-core, so you get async/await and stream() support for free.

Use ChatGroq when you need speed for high-throughput applications (chatbots, batch processing) and the available models match your task. It's a drop-in replacement for existing LCEL chains: no architecture changes required.

Analogy

Groq is like switching from a regular rental car to an airport shuttle: both get you where you need to go following the same road rules, but one is optimized for speed on a specific route.

Code

python

import os
from langchain_groq import ChatGroq
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

api_key = os.getenv("GROQ_API_KEY")
if not api_key:
    raise ValueError("GROQ_API_KEY environment variable not set")

groq_llm = ChatGroq(
    model="mixtral-8x7b-32768",
    api_key=api_key,
    temperature=0.7
)

prompt = ChatPromptTemplate.from_template(
    "Explain {topic} in one sentence for a senior developer."
)

chain = prompt | groq_llm | StrOutputParser()

result = chain.invoke({"topic": "tokenization"})
print(result)

Output

Tokenization breaks text into discrete units called tokens that language models process, where each token typically represents 4 characters of English text, and the cost and processing time of inference scales linearly with token count.

What just happened?

We instantiated ChatGroq with the Mixtral model (a fast open model Groq hosts), created a prompt template, piped it through the LLM and a string parser to build an LCEL chain, then invoked the chain with a dictionary. The chain sent the formatted prompt to Groq's servers, waited for the response, parsed it into a string, and printed it. Total latency was likely under 500ms.

Common gotcha

Developers often initialize ChatGroq without checking that their GROQ_API_KEY is set, then get a cryptic API authentication error. Always validate the key exists before instantiation. Also: Groq's model roster is smaller than OpenAI's; if you hardcode "gpt-4" as your model name, ChatGroq will reject it. Check their current models at the Groq console.

Error recovery

APIConnectionError

Your GROQ_API_KEY is invalid or expired. Generate a new key at console.groq.com and set it in your environment: export GROQ_API_KEY=your_key_here

ValueError: model_name': value must be one of

The model name does not exist on Groq's servers. Check https://console.groq.com for available models; as of April 2026, mixtral-8x7b-32768, llama-3.1-8b-instant, and gemma-7b-it are reliable choices.

TypeError: ChatGroq() got an unexpected keyword argument

You're using an old langchain-groq version. Update with: pip install --upgrade langchain-groq

Experienced dev note

Groq is fast but has no memory of previous API breaking changes: their model names and availability shift. Build your model selection as an environment variable or database field, not a hardcoded string. Also, Groq's rate limits are generous but not unlimited; if you're doing batch inference, their bulk API may be cheaper. And don't assume Groq models have identical output distributions to OpenAI models: the same prompt can produce different quality on different providers. Test your evaluation metrics before switching providers in production.

Check your understanding

You have an existing LCEL chain that uses ChatOpenAI and returns structured JSON. You want to test it on Groq for speed. What changes are strictly necessary, and what stays the same?

Show answer hint

A correct answer explains that only the LLM instantiation line changes (ChatOpenAI → ChatGroq), plus potentially the model name parameter. The prompt template, output parser, and chain pipeline syntax all stay identical because they conform to the same langchain interface.

VERSION langchain-groq is a separate package (not in langchain-core); install with pip install langchain-groq. As of langchain 1.2.x, ChatGroq is fully LCEL-compatible with no deprecated patterns.

Learn to stream responses from LLMs for real-time output: <code>chain.stream()</code> is critical when Groq's speed is your advantage.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.