ConnectionRefusedError
builtins.ConnectionRefusedError
Stack trace
Traceback (most recent call last):
File "app.py", line 42, in <module>
response = client.chat.completions.create(model="gpt-4o-mini", messages=messages)
File "/usr/local/lib/python3.9/site-packages/vllm/api_client.py", line 88, in create
raise ConnectionRefusedError("Connection refused by the vLLM server")
builtins.ConnectionRefusedError: Connection refused by the vLLM server Why it happens
This error occurs when the vLLM OpenAI compatible API client attempts to connect to the vLLM server but the connection is refused. Common causes include the vLLM server not running, incorrect server address or port, firewall blocking the connection, or network issues preventing access.
Detection
Monitor connection attempts to the vLLM server and catch ConnectionRefusedError exceptions to log and alert on connection failures before the application crashes.
Causes & fixes
vLLM server process is not running or crashed
Start or restart the vLLM server process and ensure it is listening on the expected host and port.
Client is configured with incorrect host or port for the vLLM server
Verify and update the client configuration to use the correct host and port where the vLLM server is running.
Firewall or network security group is blocking the connection to the vLLM server port
Configure firewall rules or network security groups to allow inbound connections on the vLLM server port.
vLLM server is overloaded or temporarily refusing new connections
Check server resource usage and logs; scale resources or restart the server to restore connectivity.
Code: broken vs fixed
from vllm import LLM
client = LLM(api_base="http://localhost:8000")
response = client.chat.completions.create(model="gpt-4o-mini", messages=[{"role": "user", "content": "Hello"}]) # ConnectionRefusedError here import os
from vllm import LLM
# Use environment variables for configuration
vllm_host = os.environ.get("VLLM_API_HOST", "http://localhost:8000")
client = LLM(api_base=vllm_host)
response = client.chat.completions.create(model="gpt-4o-mini", messages=[{"role": "user", "content": "Hello"}]) # Fixed: ensure server is running and reachable
print(response) Workaround
Wrap the API call in try/except ConnectionRefusedError, log the failure, and implement a retry with exponential backoff to handle temporary connection issues.
Prevention
Deploy health checks and monitoring on the vLLM server to ensure it is always running and reachable; use service discovery or environment variables to keep client configuration in sync with server endpoints.