Critical severity intermediate · Fix: 5-10 min

ConnectionRefusedError

builtins.ConnectionRefusedError

What this error means

The vLLM OpenAI compatible API client fails to connect because the server is unreachable or refusing connections.

Stack trace

traceback

Traceback (most recent call last):
  File "app.py", line 42, in <module>
    response = client.chat.completions.create(model="gpt-4o-mini", messages=messages)
  File "/usr/local/lib/python3.9/site-packages/vllm/api_client.py", line 88, in create
    raise ConnectionRefusedError("Connection refused by the vLLM server")
builtins.ConnectionRefusedError: Connection refused by the vLLM server

QUICK FIX

Ensure the vLLM server is running and reachable at the configured host and port before making API calls.

Why it happens

This error occurs when the vLLM OpenAI compatible API client attempts to connect to the vLLM server but the connection is refused. Common causes include the vLLM server not running, incorrect server address or port, firewall blocking the connection, or network issues preventing access.

Detection

Monitor connection attempts to the vLLM server and catch ConnectionRefusedError exceptions to log and alert on connection failures before the application crashes.

Causes & fixes

vLLM server process is not running or crashed

✓ Fix

Start or restart the vLLM server process and ensure it is listening on the expected host and port.

Client is configured with incorrect host or port for the vLLM server

✓ Fix

Verify and update the client configuration to use the correct host and port where the vLLM server is running.

Firewall or network security group is blocking the connection to the vLLM server port

✓ Fix

Configure firewall rules or network security groups to allow inbound connections on the vLLM server port.

vLLM server is overloaded or temporarily refusing new connections

✓ Fix

Check server resource usage and logs; scale resources or restart the server to restore connectivity.

Code: broken vs fixed

Broken - triggers the error

python

from vllm import LLM

client = LLM(api_base="http://localhost:8000")
response = client.chat.completions.create(model="gpt-4o-mini", messages=[{"role": "user", "content": "Hello"}])  # ConnectionRefusedError here

Fixed - works correctly

python

import os
from vllm import LLM

# Use environment variables for configuration
vllm_host = os.environ.get("VLLM_API_HOST", "http://localhost:8000")
client = LLM(api_base=vllm_host)

response = client.chat.completions.create(model="gpt-4o-mini", messages=[{"role": "user", "content": "Hello"}])  # Fixed: ensure server is running and reachable
print(response)

Added environment variable configuration and ensured the vLLM server is running and reachable at the specified host to prevent connection refusal.

⚠

Workaround

Wrap the API call in try/except ConnectionRefusedError, log the failure, and implement a retry with exponential backoff to handle temporary connection issues.

✓

Prevention

Deploy health checks and monitoring on the vLLM server to ensure it is always running and reachable; use service discovery or environment variables to keep client configuration in sync with server endpoints.

Python 3.9+ · vllm >=0.1.0 · tested on 0.3.x

Verified 2026-04

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.