How to run a model on Replicate in Python
Direct answer
Use the
replicate Python package to run models by calling replicate.run() with the model name and input parameters, authenticating via the REPLICATE_API_TOKEN environment variable.Setup
Install
pip install replicate Env vars
REPLICATE_API_TOKEN Imports
import os
import replicate Examples
inRun meta/meta-llama-3-8b-instruct with prompt 'Hello, how are you?' and max_tokens 512
outHello! I'm doing great, how can I assist you today?
inRun stability-ai/sdxl with prompt 'A futuristic cityscape at sunset'
out['https://replicate.delivery/pbxt/abc123/image.png']
inRun meta/meta-llama-3-8b-instruct with empty prompt
outError or empty response depending on model behavior
Integration steps
- Install the replicate package and set the REPLICATE_API_TOKEN environment variable
- Import replicate and os modules in your Python script
- Call replicate.run() with the model identifier and input parameters as a dictionary
- Capture the output returned by replicate.run()
- Use or display the output as needed in your application
Full code
import os
import replicate
# Ensure your REPLICATE_API_TOKEN is set in the environment
# Example: export REPLICATE_API_TOKEN='your_token_here'
def run_replicate_model():
model_name = "meta/meta-llama-3-8b-instruct"
inputs = {
"prompt": "Hello, how are you?",
"max_tokens": 512
}
# Run the model
output = replicate.run(model_name, input=inputs)
print("Model output:", output)
if __name__ == "__main__":
run_replicate_model() output
Model output: Hello! I'm doing great, how can I assist you today?
API trace
Request
{"model": "meta/meta-llama-3-8b-instruct", "input": {"prompt": "Hello, how are you?", "max_tokens": 512}} Response
{"id": "run_xyz", "output": "Hello! I'm doing great, how can I assist you today?", "logs": "...", "version": "..."} Extract
output = replicate.run(model_name, input=inputs)Variants
Async version using replicate.async_run ›
Use when you want to run multiple replicate calls concurrently or integrate with async frameworks.
import os
import asyncio
import replicate
async def run_async():
model_name = "meta/meta-llama-3-8b-instruct"
inputs = {"prompt": "Hello async world!", "max_tokens": 256}
output = await replicate.async_run(model_name, input=inputs)
print("Async output:", output)
if __name__ == "__main__":
asyncio.run(run_async()) Image generation with stability-ai/sdxl ›
Use this variant for image generation tasks on Replicate.
import os
import replicate
model_name = "stability-ai/sdxl"
inputs = {"prompt": "A beautiful sunset over mountains"}
output = replicate.run(model_name, input=inputs)
print("Generated image URLs:", output) Run with custom version of a model ›
Use when you want to specify a particular model version or commit hash.
import os
import replicate
model_version = "meta/meta-llama-3-8b-instruct:1234567890abcdef"
inputs = {"prompt": "Custom version prompt", "max_tokens": 128}
output = replicate.run(model_version, input=inputs)
print("Output from custom version:", output) Performance
Latency~1-5 seconds depending on model complexity and input size
CostVaries by model; check Replicate pricing per model usage
Rate limitsDepends on your Replicate account tier; typically a few hundred requests per minute
- Limit <code>max_tokens</code> to reduce cost and latency
- Use concise prompts to minimize input size
- Cache frequent outputs to avoid repeated calls
| Approach | Latency | Cost/call | Best for |
|---|---|---|---|
| Synchronous replicate.run() | ~1-5s | Model-dependent | Simple scripts and blocking calls |
| Asynchronous replicate.async_run() | ~1-5s | Model-dependent | Concurrent calls and async apps |
| Specifying model version | ~1-5s | Model-dependent | Reproducible results with fixed model versions |
Quick tip
Always set your <code>REPLICATE_API_TOKEN</code> in the environment before running replicate calls to avoid authentication errors.
Common mistake
Beginners often forget to set the <code>REPLICATE_API_TOKEN</code> environment variable, causing authentication failures.