How to beginner · 3 min read

How to generate text with vLLM

Quick answer
Use the vllm Python library to generate text by loading a model with LLM(model="model-name") and calling generate() with prompts and SamplingParams. For server usage, run vllm serve and query via the OpenAI-compatible API.

PREREQUISITES

  • Python 3.8+
  • pip install vllm
  • Access to a vLLM-compatible model checkpoint or use a public model
  • For server mode: OpenAI API key if querying via OpenAI SDK

Setup

Install the vllm library via pip and prepare your environment. You need Python 3.8 or higher.

Run:

bash
pip install vllm

Step by step

This example shows how to generate text offline using vllm in Python. It loads a local model and generates text from a prompt.

python
from vllm import LLM, SamplingParams

# Load the model (replace with your local model path or HuggingFace repo)
llm = LLM(model="meta-llama/Llama-3.1-8B-Instruct")

# Define prompt and sampling parameters
prompt = "Write a short poem about AI."
sampling_params = SamplingParams(temperature=0.7, max_tokens=100)

# Generate text
outputs = llm.generate([prompt], sampling_params)

# Extract and print the generated text
print(outputs[0].outputs[0].text)
output
Write a short poem about AI.

In circuits deep and data streams,
A mind awakes from coded dreams.
With logic bright and vision clear,
AI's voice is drawing near.

Common variations

You can run vllm as a server to serve models via an OpenAI-compatible API endpoint. Start the server with:

vllm serve meta-llama/Llama-3.1-8B-Instruct --port 8000

Then query it using the OpenAI Python SDK:

python
from openai import OpenAI
import os

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"], base_url="http://localhost:8000/v1")

response = client.chat.completions.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    messages=[{"role": "user", "content": "Write a short poem about AI."}]
)

print(response.choices[0].message.content)
output
In circuits deep and data streams,
A mind awakes from coded dreams.
With logic bright and vision clear,
AI's voice is drawing near.

Troubleshooting

  • If you get ModuleNotFoundError, ensure vllm is installed with pip install vllm.
  • If the model path is invalid, verify the model name or local checkpoint path.
  • For server mode, confirm the server is running on the specified port before querying.
  • Check your environment variables for correct API keys when using the OpenAI SDK with vllm server.

Key Takeaways

  • Use LLM and SamplingParams from vllm for offline text generation.
  • Run vllm serve to start a local server with OpenAI-compatible API.
  • Query the running vllm server using the OpenAI Python SDK with base_url set.
  • Always install vllm via pip and verify model paths to avoid errors.
  • Use environment variables for API keys; never hardcode them in code.
Verified 2026-04 · meta-llama/Llama-3.1-8B-Instruct
Verify ↗