How to deploy custom model on Replicate
Quick answer
To deploy a custom model on
Replicate, first create a replicate.yaml manifest describing your model and Docker environment, then push your code to a GitHub repository. Use the replicate CLI to build and publish your model, making it accessible via the Replicate API. You can then run inference by calling model.predict() with your model reference.PREREQUISITES
Python 3.8+GitHub account with repository for your modelDocker installed locallypip install replicateReplicate account and API token
Setup
Install the replicate Python package and CLI, set your API token as an environment variable, and prepare your model repository with a replicate.yaml manifest and Dockerfile.
pip install replicate
export REPLICATE_API_TOKEN="your_replicate_api_token" output
Collecting replicate Downloading replicate-0.10.0-py3-none-any.whl (40 kB) Installing collected packages: replicate Successfully installed replicate-0.10.0
Step by step
Create a replicate.yaml file in your model repo describing the model and Docker environment. Then use the replicate CLI to build and publish your model. Finally, run inference using the Python SDK.
### replicate.yaml example
name: username/my-custom-model
version: "1.0"
docker:
build:
context: .
command: python predict.py
# predict.py example
import sys, json
input_data = json.loads(sys.stdin.read())
output = {"result": input_data["text"].upper()}
print(json.dumps(output))
# Build and publish model
replicate build
# Python inference example
import os
import replicate
os.environ["REPLICATE_API_TOKEN"] = os.getenv("REPLICATE_API_TOKEN")
model = replicate.models.get("username/my-custom-model")
output = model.predict(text="hello world")
print(output) output
Building Docker image...
Pushing model version...
Model published as username/my-custom-model:1.0
{'result': 'HELLO WORLD'} Common variations
- Use
replicate.yamlto specify different Docker commands or dependencies. - Run inference asynchronously with
await model.predict_async()in async Python. - Deploy models from different Git branches or tags by specifying version in
replicate.yaml.
import asyncio
import os
import replicate
os.environ["REPLICATE_API_TOKEN"] = os.getenv("REPLICATE_API_TOKEN")
async def async_infer():
model = replicate.models.get("username/my-custom-model")
output = await model.predict_async(text="async call")
print(output)
asyncio.run(async_infer()) output
{'result': 'ASYNC CALL'} Troubleshooting
- If
replicate buildfails, check your Dockerfile andreplicate.yamlsyntax. - Ensure your API token is set correctly in
REPLICATE_API_TOKENenvironment variable. - For authentication errors, verify your Replicate account permissions and token validity.
- If inference returns errors, confirm your
predict.pyscript reads input fromstdinand outputs valid JSON.
Key Takeaways
- Use a
replicate.yamlmanifest and Dockerfile to define your custom model environment. - Build and publish your model with the
replicateCLI to make it available via API. - Run inference using the
replicatePython SDK with your model reference. - Set your
REPLICATE_API_TOKENenvironment variable for authentication. - Troubleshoot build and inference errors by validating Docker setup and input/output JSON formats.