How to Intermediate · 4 min read

How to deploy custom model on Replicate

Q: How to deploy custom model on Replicate

To deploy a custom model on Replicate, first create a replicate.yaml manifest describing your model and Docker environment, then push your code to a GitHub repository. Use the replicate CLI to build and publish your model, making it accessible via the Replicate API. You can then run inference by calling model.predict() with your model reference.

Quick answer

To deploy a custom model on Replicate, first create a replicate.yaml manifest describing your model and Docker environment, then push your code to a GitHub repository. Use the replicate CLI to build and publish your model, making it accessible via the Replicate API. You can then run inference by calling model.predict() with your model reference.

PREREQUISITES

Python 3.8+
GitHub account with repository for your model
Docker installed locally
pip install replicate
Replicate account and API token

Setup

Install the replicate Python package and CLI, set your API token as an environment variable, and prepare your model repository with a replicate.yaml manifest and Dockerfile.

bash

pip install replicate
export REPLICATE_API_TOKEN="your_replicate_api_token"

output

Collecting replicate
  Downloading replicate-0.10.0-py3-none-any.whl (40 kB)
Installing collected packages: replicate
Successfully installed replicate-0.10.0

Step by step

Create a replicate.yaml file in your model repo describing the model and Docker environment. Then use the replicate CLI to build and publish your model. Finally, run inference using the Python SDK.

bash

### replicate.yaml example
name: username/my-custom-model
version: "1.0"
docker:
  build:
    context: .
  command: python predict.py

# predict.py example
import sys, json
input_data = json.loads(sys.stdin.read())
output = {"result": input_data["text"].upper()}
print(json.dumps(output))

# Build and publish model
replicate build

# Python inference example
import os
import replicate

os.environ["REPLICATE_API_TOKEN"] = os.getenv("REPLICATE_API_TOKEN")

model = replicate.models.get("username/my-custom-model")
output = model.predict(text="hello world")
print(output)

output

Building Docker image...
Pushing model version...
Model published as username/my-custom-model:1.0
{'result': 'HELLO WORLD'}

Common variations

Use replicate.yaml to specify different Docker commands or dependencies.
Run inference asynchronously with await model.predict_async() in async Python.
Deploy models from different Git branches or tags by specifying version in replicate.yaml.

python

import asyncio
import os
import replicate

os.environ["REPLICATE_API_TOKEN"] = os.getenv("REPLICATE_API_TOKEN")

async def async_infer():
    model = replicate.models.get("username/my-custom-model")
    output = await model.predict_async(text="async call")
    print(output)

asyncio.run(async_infer())

output

{'result': 'ASYNC CALL'}

Troubleshooting

If replicate build fails, check your Dockerfile and replicate.yaml syntax.
Ensure your API token is set correctly in REPLICATE_API_TOKEN environment variable.
For authentication errors, verify your Replicate account permissions and token validity.
If inference returns errors, confirm your predict.py script reads input from stdin and outputs valid JSON.

Key Takeaways

Use a replicate.yaml manifest and Dockerfile to define your custom model environment.
Build and publish your model with the replicate CLI to make it available via API.
Run inference using the replicate Python SDK with your model reference.
Set your REPLICATE_API_TOKEN environment variable for authentication.
Troubleshoot build and inference errors by validating Docker setup and input/output JSON formats.

Verified 2026-04 · username/my-custom-model

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.