How to beginner · 3 min read

How to install vLLM

Quick answer
Install vLLM using pip install vllm in a Python 3.8+ environment. This installs the core library for efficient local large language model inference.

PREREQUISITES

  • Python 3.8+
  • pip (Python package installer)
  • Basic command line access

Setup

Install vLLM via pip in your Python environment. Ensure you have Python 3.8 or newer installed.

bash
pip install vllm

Step by step

After installation, you can run a simple script to load a local LLM model and generate text. Below is a minimal example using vLLM to generate text from a prompt.

python
from vllm import LLM, SamplingParams

# Initialize the LLM with a local model path or Hugging Face model ID
llm = LLM(model="meta-llama/Llama-3.1-8B-Instruct")

# Generate text with sampling parameters
outputs = llm.generate(["Hello, vLLM!"], SamplingParams(temperature=0.7, max_tokens=50))

# Print the generated text
print(outputs[0].outputs[0].text)
output
Hello, vLLM! This is a sample output generated by the vLLM inference engine.

Common variations

You can run vLLM as a server via CLI and query it using the OpenAI-compatible API. Start the server with:

vllm serve meta-llama/Llama-3.1-8B-Instruct --port 8000

Then query it in Python using the OpenAI SDK with base_url="http://localhost:8000/v1". This enables integration with existing OpenAI-compatible clients.

python
from openai import OpenAI
import os

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"], base_url="http://localhost:8000/v1")
response = client.chat.completions.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    messages=[{"role": "user", "content": "Hello from OpenAI client to vLLM server!"}]
)
print(response.choices[0].message.content)
output
Hello from OpenAI client to vLLM server! This is a response generated by the vLLM model.

Troubleshooting

  • If you see ModuleNotFoundError, ensure vLLM is installed in your active Python environment.
  • If the model download fails, check your internet connection or specify a local model path.
  • For permission errors, run the install command with appropriate user privileges or inside a virtual environment.

Key Takeaways

  • Use pip install vllm to install the vLLM Python package.
  • Run local LLM inference by loading models with the LLM class and calling generate().
  • Serve vLLM as an API server with vllm serve and query it via OpenAI-compatible clients.
  • Ensure Python 3.8+ and proper environment setup to avoid installation issues.
Verified 2026-04 · meta-llama/Llama-3.1-8B-Instruct
Verify ↗