ConnectionError
requests.exceptions.ConnectionError
Stack trace
Traceback (most recent call last):
File "app.py", line 42, in <module>
response = client.generate(prompt)
File "/usr/local/lib/python3.9/site-packages/llamacpp/client.py", line 88, in generate
resp = requests.post(self.endpoint, json=payload, timeout=10)
File "/usr/local/lib/python3.9/site-packages/requests/api.py", line 119, in post
return request('post', url, data=data, json=json, **kwargs)
File "/usr/local/lib/python3.9/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 530, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 643, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=5000): Max retries exceeded with url: /generate (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8c2a4d2d60>: Failed to establish a new connection: [Errno 111] Connection refused')) Why it happens
This error occurs because the Python client cannot establish a network connection to the llama.cpp server endpoint. Common reasons include the server not running, incorrect endpoint URL or port, firewall blocking the connection, or network issues.
Detection
Monitor connection exceptions from the client library and log connection failures with endpoint details to detect before the app crashes.
Causes & fixes
llama.cpp server process is not running or crashed
Start or restart the llama.cpp server process and verify it is listening on the expected port.
Incorrect server endpoint URL or port configured in the client
Check and correct the endpoint URL and port in your client configuration to match the running server.
Firewall or network rules blocking connection to the server port
Ensure firewall rules allow traffic on the server port and that no network policies block localhost or remote connections.
Server is overloaded or temporarily unreachable
Implement retry logic with exponential backoff in the client and monitor server health to handle transient unavailability.
Code: broken vs fixed
import os
import requests
endpoint = "http://localhost:5000/wrongpath" # Incorrect endpoint path
payload = {"prompt": "Hello"}
response = requests.post(endpoint, json=payload) # This line raises ConnectionError
print(response.json()) import os
import requests
endpoint = "http://localhost:5000/generate" # Correct endpoint path
payload = {"prompt": "Hello"}
response = requests.post(endpoint, json=payload) # Fixed: correct endpoint
print(response.json()) # Should print the server response Workaround
Wrap the request call in try/except ConnectionError, log the failure, and retry after a short delay to handle temporary server downtime.
Prevention
Implement health checks and monitoring for the llama.cpp server process and validate client endpoint configuration during deployment to avoid connection failures.