How to beginner · 3 min read

vLLM logs and monitoring

Q: vLLM logs and monitoring

Use the vllm CLI with the --log-file option or configure logging in Python by setting up the logging module before running LLM.generate(). For monitoring, parse the log files or integrate with external monitoring tools by capturing metrics from the logs or server output.

Quick answer

Use the vllm CLI with the --log-file option or configure logging in Python by setting up the logging module before running LLM.generate(). For monitoring, parse the log files or integrate with external monitoring tools by capturing metrics from the logs or server output.

PREREQUISITES

Python 3.8+
pip install vllm
Basic knowledge of Python logging

Setup logging for vLLM

vLLM supports logging through standard Python logging and CLI flags. To capture detailed logs, configure Python's logging module or use the CLI --log-file option to write logs to a file.

bash

pip install vllm

Step by step: Enable logs and monitor usage

This example shows how to enable logging in Python when using vllm for inference and how to monitor logs for usage insights.

python

import logging
from vllm import LLM, SamplingParams

# Configure logging to file and console
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s %(levelname)s %(message)s',
    handlers=[
        logging.FileHandler('vllm_inference.log'),
        logging.StreamHandler()
    ]
)

# Initialize the LLM
llm = LLM(model="meta-llama/Llama-3.1-8B-Instruct")

# Generate text with logging enabled
outputs = llm.generate([
    "Explain the benefits of logging in AI model serving."
], SamplingParams(temperature=0.7))

# Print output text
print(outputs[0].outputs[0].text)

# Logs are saved in 'vllm_inference.log' for monitoring and debugging

output

Explain the benefits of logging in AI model serving.

Logging helps track model usage, performance, and errors, enabling better debugging and optimization.

Common variations

Use the CLI vllm generate --log-file logs.txt to save logs directly when running inference from terminal.
For async usage, integrate logging similarly by configuring logging before async calls.
Adjust logging levels (DEBUG, INFO, WARNING) to control verbosity.

bash

vllm generate --model meta-llama/Llama-3.1-8B-Instruct --log-file vllm_cli.log "What is vLLM?"

output

INFO: Starting inference with model meta-llama/Llama-3.1-8B-Instruct
INFO: Generated output: vLLM is a high-performance inference engine for large language models...

Troubleshooting

If logs are not appearing, ensure logging.basicConfig is called before any vllm usage.
Check file permissions for log file paths.
For missing or incomplete logs, increase logging level to DEBUG.

✅

Key Takeaways

Configure Python logging before using vllm to capture detailed inference logs.
Use the CLI --log-file option for quick logging without code changes.
Monitor logs to track model usage, performance, and troubleshoot issues effectively.

Verified 2026-04 · meta-llama/Llama-3.1-8B-Instruct

Verify ↗