How to Intermediate · 4 min read

How to deploy custom model on Replicate

Quick answer
To deploy a custom model on Replicate, first create a replicate.yaml manifest describing your model and Docker environment, then push your code to a GitHub repository. Use the replicate CLI to build and publish your model, making it accessible via the Replicate API. You can then run inference by calling model.predict() with your model reference.

PREREQUISITES

  • Python 3.8+
  • GitHub account with repository for your model
  • Docker installed locally
  • pip install replicate
  • Replicate account and API token

Setup

Install the replicate Python package and CLI, set your API token as an environment variable, and prepare your model repository with a replicate.yaml manifest and Dockerfile.

bash
pip install replicate
export REPLICATE_API_TOKEN="your_replicate_api_token"
output
Collecting replicate
  Downloading replicate-0.10.0-py3-none-any.whl (40 kB)
Installing collected packages: replicate
Successfully installed replicate-0.10.0

Step by step

Create a replicate.yaml file in your model repo describing the model and Docker environment. Then use the replicate CLI to build and publish your model. Finally, run inference using the Python SDK.

bash
### replicate.yaml example
name: username/my-custom-model
version: "1.0"
docker:
  build:
    context: .
  command: python predict.py

# predict.py example
import sys, json
input_data = json.loads(sys.stdin.read())
output = {"result": input_data["text"].upper()}
print(json.dumps(output))

# Build and publish model
replicate build

# Python inference example
import os
import replicate

os.environ["REPLICATE_API_TOKEN"] = os.getenv("REPLICATE_API_TOKEN")

model = replicate.models.get("username/my-custom-model")
output = model.predict(text="hello world")
print(output)
output
Building Docker image...
Pushing model version...
Model published as username/my-custom-model:1.0
{'result': 'HELLO WORLD'}

Common variations

  • Use replicate.yaml to specify different Docker commands or dependencies.
  • Run inference asynchronously with await model.predict_async() in async Python.
  • Deploy models from different Git branches or tags by specifying version in replicate.yaml.
python
import asyncio
import os
import replicate

os.environ["REPLICATE_API_TOKEN"] = os.getenv("REPLICATE_API_TOKEN")

async def async_infer():
    model = replicate.models.get("username/my-custom-model")
    output = await model.predict_async(text="async call")
    print(output)

asyncio.run(async_infer())
output
{'result': 'ASYNC CALL'}

Troubleshooting

  • If replicate build fails, check your Dockerfile and replicate.yaml syntax.
  • Ensure your API token is set correctly in REPLICATE_API_TOKEN environment variable.
  • For authentication errors, verify your Replicate account permissions and token validity.
  • If inference returns errors, confirm your predict.py script reads input from stdin and outputs valid JSON.

Key Takeaways

  • Use a replicate.yaml manifest and Dockerfile to define your custom model environment.
  • Build and publish your model with the replicate CLI to make it available via API.
  • Run inference using the replicate Python SDK with your model reference.
  • Set your REPLICATE_API_TOKEN environment variable for authentication.
  • Troubleshoot build and inference errors by validating Docker setup and input/output JSON formats.
Verified 2026-04 · username/my-custom-model
Verify ↗