How to beginner · 3 min read

Replicate predictions API explained

Q: Replicate predictions API explained

The Replicate predictions API lets you run inference on machine learning models hosted on Replicate's platform by sending input parameters and receiving output predictions. You use the replicate Python package or HTTP API to create prediction jobs and retrieve results asynchronously or synchronously.

Quick answer

The Replicate predictions API lets you run inference on machine learning models hosted on Replicate's platform by sending input parameters and receiving output predictions. You use the replicate Python package or HTTP API to create prediction jobs and retrieve results asynchronously or synchronously.

PREREQUISITES

Python 3.8+
Replicate API token (set REPLICATE_API_TOKEN environment variable)
pip install replicate

Setup

Install the official replicate Python package and set your API token as an environment variable for authentication.

bash

pip install replicate

output

Collecting replicate
  Downloading replicate-0.10.0-py3-none-any.whl (20 kB)
Installing collected packages: replicate
Successfully installed replicate-0.10.0

Step by step

This example shows how to create a prediction job with the Replicate Python SDK, wait for completion, and print the output.

python

import os
import replicate

# Ensure your API token is set in the environment
# export REPLICATE_API_TOKEN="your_token_here"

client = replicate.Client()

# Specify the model and input parameters
model = client.models.get("stability-ai/stable-diffusion")
version = model.versions.get("db21e45d73e9ab0a5b1a5c6a1b3f9a7f3b9a5e1a4a5a6a7a8a9a0a1a2a3a4a5a")

inputs = {
    "prompt": "A futuristic cityscape at sunset",
    "width": 512,
    "height": 512,
    "num_inference_steps": 50
}

# Create a prediction
prediction = client.predictions.create(version=version, input=inputs)

# Wait for the prediction to complete
prediction.wait()

# Print the output URL(s)
print("Prediction output:", prediction.output)

output

Prediction output: ['https://replicate.delivery/pbxt/abc123...']

Common variations

You can use the replicate.run() shortcut for synchronous calls, or use the HTTP API directly with requests. Async usage requires custom async wrappers since the SDK is synchronous.

python

import os
import replicate

client = replicate.Client()

# Synchronous shortcut
output = replicate.run(
    "stability-ai/stable-diffusion:db21e45d73e9ab0a5b1a5c6a1b3f9a7f3b9a5e1a4a5a6a7a8a9a0a1a2a3a4a5a",
    input={"prompt": "A dragon flying over mountains"}
)
print("Output URL:", output)

output

Output URL: ['https://replicate.delivery/pbxt/xyz789...']

Troubleshooting

If you get authentication errors, verify your REPLICATE_API_TOKEN environment variable is set correctly.
For timeout or network errors, check your internet connection and retry.
If the prediction status is "failed", inspect prediction.error for details.

✅

Key Takeaways

Use the official replicate Python package with your API token set in REPLICATE_API_TOKEN.
Create predictions by specifying model version and input parameters, then wait for completion to get results.
The replicate.run() method offers a simple synchronous interface for quick predictions.
Check prediction.error and environment variables if you encounter failures or authentication issues.

Verified 2026-04 · stability-ai/stable-diffusion

Verify ↗