How to beginner · 3 min read

How to use Llama with LangChain

Quick answer
To use Llama with LangChain, use an OpenAI-compatible provider like Groq or Together AI by configuring the OpenAI client with the provider's base_url and API key. Then instantiate ChatOpenAI from langchain_openai with the provider model name to run chat completions seamlessly.

PREREQUISITES

  • Python 3.8+
  • OpenAI-compatible API key from a Llama provider (e.g., Groq, Together AI)
  • pip install openai>=1.0 langchain_openai

Setup

Install the required Python packages and set your environment variables for the Llama provider API key. Use an OpenAI-compatible Llama API endpoint such as Groq or Together AI.

bash
pip install openai langchain_openai

Step by step

This example shows how to use LangChain with a Llama model hosted by Groq. Replace GROQ_API_KEY with your actual API key and set the base_url to Groq's OpenAI-compatible endpoint.

python
import os
from openai import OpenAI
from langchain_openai import ChatOpenAI

# Set environment variable GROQ_API_KEY before running
client = OpenAI(api_key=os.environ["GROQ_API_KEY"], base_url="https://api.groq.com/openai/v1")

# Initialize LangChain ChatOpenAI with Groq Llama model
chat = ChatOpenAI(
    client=client,
    model_name="llama-3.3-70b-versatile",
    temperature=0.7
)

# Run a chat completion
response = chat.invoke([{"role": "user", "content": "Explain LangChain integration with Llama."}])
print(response.content)
output
Explain LangChain integration with Llama.

LangChain can use Llama models by configuring the OpenAI-compatible client with the Llama provider's API endpoint and model name, enabling seamless chat completions.

Common variations

  • Use Together AI by changing base_url to https://api.together.xyz/v1 and model to meta-llama/Llama-3.3-70B-Instruct-Turbo.
  • For async calls, use await chat.ainvoke([...]) inside an async function.
  • Adjust temperature or max_tokens in ChatOpenAI for different output styles.
python
import asyncio

async def async_example():
    response = await chat.ainvoke([{"role": "user", "content": "What is LangChain?"}])
    print(response.content)

asyncio.run(async_example())
output
LangChain is a framework for building applications with language models, enabling chaining of prompts and integration with various AI providers.

Troubleshooting

  • If you get authentication errors, verify your API key is set correctly in os.environ.
  • For model not found errors, confirm the model name matches the provider's current offerings.
  • Timeouts may require increasing client timeout settings or checking network connectivity.

Key Takeaways

  • Use OpenAI-compatible clients with provider-specific base URLs to access Llama models in LangChain.
  • Configure ChatOpenAI with the provider's model name for seamless integration.
  • Async and streaming calls are supported by LangChain's ChatOpenAI interface.
  • Always set API keys securely via environment variables to avoid authentication issues.
  • Check provider documentation for up-to-date model names and endpoints.
Verified 2026-04 · llama-3.3-70b-versatile, meta-llama/Llama-3.3-70B-Instruct-Turbo
Verify ↗