How to beginner · 3 min read

How to install vLLM

Q: How to install vLLM

Install vLLM using pip install vllm in a Python 3.8+ environment. This installs the core library for efficient local large language model inference.

Quick answer

Install vLLM using pip install vllm in a Python 3.8+ environment. This installs the core library for efficient local large language model inference.

PREREQUISITES

Python 3.8+
pip (Python package installer)
Basic command line access

Setup

Install vLLM via pip in your Python environment. Ensure you have Python 3.8 or newer installed.

bash

pip install vllm

Step by step

After installation, you can run a simple script to load a local LLM model and generate text. Below is a minimal example using vLLM to generate text from a prompt.

python

from vllm import LLM, SamplingParams

# Initialize the LLM with a local model path or Hugging Face model ID
llm = LLM(model="meta-llama/Llama-3.1-8B-Instruct")

# Generate text with sampling parameters
outputs = llm.generate(["Hello, vLLM!"], SamplingParams(temperature=0.7, max_tokens=50))

# Print the generated text
print(outputs[0].outputs[0].text)

output

Hello, vLLM! This is a sample output generated by the vLLM inference engine.

Common variations

You can run vLLM as a server via CLI and query it using the OpenAI-compatible API. Start the server with:

vllm serve meta-llama/Llama-3.1-8B-Instruct --port 8000

Then query it in Python using the OpenAI SDK with base_url="http://localhost:8000/v1". This enables integration with existing OpenAI-compatible clients.

python

from openai import OpenAI
import os

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"], base_url="http://localhost:8000/v1")
response = client.chat.completions.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    messages=[{"role": "user", "content": "Hello from OpenAI client to vLLM server!"}]
)
print(response.choices[0].message.content)

output

Hello from OpenAI client to vLLM server! This is a response generated by the vLLM model.

Troubleshooting

If you see ModuleNotFoundError, ensure vLLM is installed in your active Python environment.
If the model download fails, check your internet connection or specify a local model path.
For permission errors, run the install command with appropriate user privileges or inside a virtual environment.

✅

Key Takeaways

Use pip install vllm to install the vLLM Python package.
Run local LLM inference by loading models with the LLM class and calling generate().
Serve vLLM as an API server with vllm serve and query it via OpenAI-compatible clients.
Ensure Python 3.8+ and proper environment setup to avoid installation issues.

Verified 2026-04 · meta-llama/Llama-3.1-8B-Instruct

Verify ↗