ChatGroq: fastest inference, same LCEL interface
Why this matters
Speed matters in production: Groq's token processing can be 10x faster than cloud APIs, and langchain's unified interface means you can swap providers without rewriting your application logic.
Explanation
ChatGroq is a langchain LLM wrapper around Groq's inference API, which runs models on custom silicon designed for speed. It implements the exact same BaseLanguageModel interface as ChatOpenAI and ChatAnthropic, so you write one LCEL chain and swap providers by changing a single import and one line.
Mechanically, ChatGroq sends prompts to Groq's cloud API (not self-hosted) and deserializes the response back into langchain's message format. Under the hood, it inherits streaming, token counting, and structured output methods from langchain-core, so you get async/await and stream() support for free.
Use ChatGroq when you need speed for high-throughput applications (chatbots, batch processing) and the available models match your task. It's a drop-in replacement for existing LCEL chains: no architecture changes required.
Analogy
Groq is like switching from a regular rental car to an airport shuttle: both get you where you need to go following the same road rules, but one is optimized for speed on a specific route.
Code
import os
from langchain_groq import ChatGroq
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
api_key = os.getenv("GROQ_API_KEY")
if not api_key:
raise ValueError("GROQ_API_KEY environment variable not set")
groq_llm = ChatGroq(
model="mixtral-8x7b-32768",
api_key=api_key,
temperature=0.7
)
prompt = ChatPromptTemplate.from_template(
"Explain {topic} in one sentence for a senior developer."
)
chain = prompt | groq_llm | StrOutputParser()
result = chain.invoke({"topic": "tokenization"})
print(result) Tokenization breaks text into discrete units called tokens that language models process, where each token typically represents 4 characters of English text, and the cost and processing time of inference scales linearly with token count.
What just happened?
We instantiated ChatGroq with the Mixtral model (a fast open model Groq hosts), created a prompt template, piped it through the LLM and a string parser to build an LCEL chain, then invoked the chain with a dictionary. The chain sent the formatted prompt to Groq's servers, waited for the response, parsed it into a string, and printed it. Total latency was likely under 500ms.
Common gotcha
Developers often initialize ChatGroq without checking that their GROQ_API_KEY is set, then get a cryptic API authentication error. Always validate the key exists before instantiation. Also: Groq's model roster is smaller than OpenAI's; if you hardcode "gpt-4" as your model name, ChatGroq will reject it. Check their current models at the Groq console.
Error recovery
APIConnectionErrorValueError: model_name': value must be one ofTypeError: ChatGroq() got an unexpected keyword argumentExperienced dev note
Groq is fast but has no memory of previous API breaking changes: their model names and availability shift. Build your model selection as an environment variable or database field, not a hardcoded string. Also, Groq's rate limits are generous but not unlimited; if you're doing batch inference, their bulk API may be cheaper. And don't assume Groq models have identical output distributions to OpenAI models: the same prompt can produce different quality on different providers. Test your evaluation metrics before switching providers in production.
Check your understanding
You have an existing LCEL chain that uses ChatOpenAI and returns structured JSON. You want to test it on Groq for speed. What changes are strictly necessary, and what stays the same?
Show answer hint
A correct answer explains that only the LLM instantiation line changes (ChatOpenAI → ChatGroq), plus potentially the model name parameter. The prompt template, output parser, and chain pipeline syntax all stay identical because they conform to the same langchain interface.