ReplicatePredictionTimeoutError
replicate.exceptions.ReplicatePredictionTimeoutError
Stack trace
replicate.exceptions.ReplicatePredictionTimeoutError: Prediction request timed out after waiting for the model to complete.
Why it happens
Replicate predictions have a maximum allowed runtime. If the model takes longer than this timeout to generate a result, the API aborts the request and raises this error. This can happen due to large inputs, slow models, or network delays.
Detection
Monitor prediction response times and catch ReplicatePredictionTimeoutError exceptions to log and identify slow or stuck predictions before they block your workflow.
Causes & fixes
Model inference takes longer than the default or configured timeout period.
Increase the timeout parameter in the prediction call if supported, or optimize input size and model choice to reduce runtime.
Network latency or connectivity issues delay the prediction response beyond the timeout.
Ensure stable network connectivity and retry the prediction request with exponential backoff on timeout errors.
Using a large or complex input that causes the model to exceed runtime limits.
Simplify or reduce input data size, or split inputs into smaller batches to keep prediction times within limits.
Code: broken vs fixed
import replicate
client = replicate.Client(api_token="my-token")
# This call may timeout if model is slow or input is large
prediction = client.predictions.create(
version="model-version",
input={"image": "https://example.com/image.png"}
) # This line triggers timeout error
print(prediction.output) import os
import replicate
client = replicate.Client(api_token=os.environ["REPLICATE_API_TOKEN"])
# Added timeout parameter to allow longer prediction time
prediction = client.predictions.create(
version="model-version",
input={"image": "https://example.com/image.png"},
timeout=120 # Increased timeout to 120 seconds
)
print(prediction.output) # Now works without timing out Workaround
Wrap the prediction call in a try/except block catching ReplicatePredictionTimeoutError, then retry the request with a longer timeout or smaller input batch.
Prevention
Design your application to handle prediction timeouts gracefully by setting appropriate timeouts, optimizing inputs, and implementing retries with backoff to avoid blocking on slow model responses.