vLLM logs and monitoring
Quick answer
Use the
vllm CLI with the --log-file option or configure logging in Python by setting up the logging module before running LLM.generate(). For monitoring, parse the log files or integrate with external monitoring tools by capturing metrics from the logs or server output.PREREQUISITES
Python 3.8+pip install vllmBasic knowledge of Python logging
Setup logging for vLLM
vLLM supports logging through standard Python logging and CLI flags. To capture detailed logs, configure Python's logging module or use the CLI --log-file option to write logs to a file.
pip install vllm Step by step: Enable logs and monitor usage
This example shows how to enable logging in Python when using vllm for inference and how to monitor logs for usage insights.
import logging
from vllm import LLM, SamplingParams
# Configure logging to file and console
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s %(levelname)s %(message)s',
handlers=[
logging.FileHandler('vllm_inference.log'),
logging.StreamHandler()
]
)
# Initialize the LLM
llm = LLM(model="meta-llama/Llama-3.1-8B-Instruct")
# Generate text with logging enabled
outputs = llm.generate([
"Explain the benefits of logging in AI model serving."
], SamplingParams(temperature=0.7))
# Print output text
print(outputs[0].outputs[0].text)
# Logs are saved in 'vllm_inference.log' for monitoring and debugging output
Explain the benefits of logging in AI model serving. Logging helps track model usage, performance, and errors, enabling better debugging and optimization.
Common variations
- Use the CLI
vllm generate --log-file logs.txtto save logs directly when running inference from terminal. - For async usage, integrate logging similarly by configuring
loggingbefore async calls. - Adjust logging levels (DEBUG, INFO, WARNING) to control verbosity.
vllm generate --model meta-llama/Llama-3.1-8B-Instruct --log-file vllm_cli.log "What is vLLM?" output
INFO: Starting inference with model meta-llama/Llama-3.1-8B-Instruct INFO: Generated output: vLLM is a high-performance inference engine for large language models...
Troubleshooting
- If logs are not appearing, ensure
logging.basicConfigis called before anyvllmusage. - Check file permissions for log file paths.
- For missing or incomplete logs, increase logging level to
DEBUG.
Key Takeaways
- Configure Python logging before using
vllmto capture detailed inference logs. - Use the CLI
--log-fileoption for quick logging without code changes. - Monitor logs to track model usage, performance, and troubleshoot issues effectively.