RunPodEndpointNotReadyError
runpod.client.errors.RunPodEndpointNotReadyError
Stack trace
runpod.client.errors.RunPodEndpointNotReadyError: Endpoint is not ready yet. Cold start in progress. Please retry after some time.
Why it happens
RunPod deploys models in containers that may take several seconds to minutes to initialize on first request (cold start). During this time, the endpoint is not ready to accept inference calls, causing this error.
Detection
Monitor API responses for RunPodEndpointNotReadyError exceptions and log timestamps to detect cold start delays before impacting users.
Causes & fixes
Model container is still starting up after deployment or scale-up
Implement retry logic with exponential backoff to wait for the endpoint to become ready before sending inference requests.
Sending inference requests immediately after deployment without readiness checks
Add a health check or readiness probe to confirm the endpoint is ready before routing traffic.
Insufficient resources causing slow container startup
Increase allocated CPU/memory resources for the RunPod deployment to reduce cold start time.
Code: broken vs fixed
from runpod import RunPodClient
client = RunPodClient()
response = client.infer(model_id='my-model', input_data={'text': 'Hello'}) # This line raises RunPodEndpointNotReadyError import os
import time
from runpod import RunPodClient
client = RunPodClient()
for attempt in range(5):
try:
response = client.infer(model_id='my-model', input_data={'text': 'Hello'})
print('Inference response:', response)
break
except runpod.client.errors.RunPodEndpointNotReadyError:
print('Endpoint not ready, retrying...')
time.sleep(10) # Wait 10 seconds before retrying
else:
print('Failed to get response after retries')
# Fixed by adding retry with delay on endpoint not ready error Workaround
Catch RunPodEndpointNotReadyError exceptions and parse the error message to trigger a wait-and-retry mechanism instead of failing immediately.
Prevention
Use RunPod health checks or readiness probes to confirm endpoint availability before sending traffic, and allocate sufficient resources to minimize cold start delays.