How to Intermediate · 3 min read

How to use vLLM with LangChain

Quick answer
Use the vllm Python package to run LLMs locally or via a server, then connect it to LangChain by configuring LangChain to call the OpenAI compatible API endpoint exposed by vLLM. For offline usage, run the vLLM CLI server and query it with LangChain's ChatOpenAI client using base_url pointing to the local server.

PREREQUISITES

  • Python 3.8+
  • pip install vllm langchain_openai openai
  • vLLM model files downloaded or accessible
  • Basic knowledge of LangChain and OpenAI API usage

Setup

Install the required packages and prepare the vLLM model server. You need vllm for running the model locally and langchain_openai for LangChain integration.

bash
pip install vllm langchain_openai openai

Step by step

Start the vLLM server locally, then use LangChain's ChatOpenAI client configured to call the local vLLM server via base_url. This example runs the meta-llama/Llama-3.1-8B-Instruct model.

python
import os
from langchain_openai import ChatOpenAI

# Step 1: Start vLLM server in terminal (run this separately):
# vllm serve meta-llama/Llama-3.1-8B-Instruct --port 8000

# Step 2: Use LangChain to query the local vLLM server
client = ChatOpenAI(
    model_name="gpt-4o",  # Model name here is ignored by vLLM server but required by LangChain
    openai_api_key=None,  # No API key needed for local server
    base_url="http://localhost:8000/v1"
)

response = client.invoke([{"role": "user", "content": "Write a Python function to reverse a string."}])
print(response.content)
output
def reverse_string(s):
    return s[::-1]

Common variations

  • Use different models by changing the model name in the vllm serve command.
  • Run vLLM offline without internet or API keys.
  • Use the OpenAI SDK directly with base_url pointing to vLLM server for custom integrations.

Troubleshooting

  • If you get connection errors, ensure the vLLM server is running on the specified port.
  • Check firewall or network settings blocking localhost:8000.
  • Verify model files are downloaded and accessible by vLLM.

Key Takeaways

  • Run vLLM as a local server with the CLI command before querying from LangChain.
  • Configure LangChain's ChatOpenAI with base_url to point to the vLLM server for seamless integration.
  • No API key is needed when querying a local vLLM server, enabling offline usage.
  • You can switch models by changing the model argument in the vLLM serve command.
  • Ensure network and firewall settings allow connections to the vLLM server port.
Verified 2026-04 · meta-llama/Llama-3.1-8B-Instruct, gpt-4o
Verify ↗