Code beginner · 3 min read

How to run a model on Replicate in Python

Q: How to run a model on Replicate in Python

Use the replicate Python package to run models by calling replicate.run() with the model name and input parameters, authenticating via the REPLICATE_API_TOKEN environment variable.

Direct answer

Use the replicate Python package to run models by calling replicate.run() with the model name and input parameters, authenticating via the REPLICATE_API_TOKEN environment variable.

Setup

Install

bash

pip install replicate

Env vars

REPLICATE_API_TOKEN

Imports

python

import os
import replicate

Examples

inRun meta/meta-llama-3-8b-instruct with prompt 'Hello, how are you?' and max_tokens 512

outHello! I'm doing great, how can I assist you today?

inRun stability-ai/sdxl with prompt 'A futuristic cityscape at sunset'

out['https://replicate.delivery/pbxt/abc123/image.png']

inRun meta/meta-llama-3-8b-instruct with empty prompt

outError or empty response depending on model behavior

Integration steps

Install the replicate package and set the REPLICATE_API_TOKEN environment variable
Import replicate and os modules in your Python script
Call replicate.run() with the model identifier and input parameters as a dictionary
Capture the output returned by replicate.run()
Use or display the output as needed in your application

Full code

python

import os
import replicate

# Ensure your REPLICATE_API_TOKEN is set in the environment
# Example: export REPLICATE_API_TOKEN='your_token_here'

def run_replicate_model():
    model_name = "meta/meta-llama-3-8b-instruct"
    inputs = {
        "prompt": "Hello, how are you?",
        "max_tokens": 512
    }
    
    # Run the model
    output = replicate.run(model_name, input=inputs)
    print("Model output:", output)

if __name__ == "__main__":
    run_replicate_model()

output

Model output: Hello! I'm doing great, how can I assist you today?

API trace

Request

json

{"model": "meta/meta-llama-3-8b-instruct", "input": {"prompt": "Hello, how are you?", "max_tokens": 512}}

Response

json

{"id": "run_xyz", "output": "Hello! I'm doing great, how can I assist you today?", "logs": "...", "version": "..."}

Extractoutput = replicate.run(model_name, input=inputs)

Variants

Async version using replicate.async_run ›

Use when you want to run multiple replicate calls concurrently or integrate with async frameworks.

python

import os
import asyncio
import replicate

async def run_async():
    model_name = "meta/meta-llama-3-8b-instruct"
    inputs = {"prompt": "Hello async world!", "max_tokens": 256}
    output = await replicate.async_run(model_name, input=inputs)
    print("Async output:", output)

if __name__ == "__main__":
    asyncio.run(run_async())

Image generation with stability-ai/sdxl ›

Use this variant for image generation tasks on Replicate.

python

import os
import replicate

model_name = "stability-ai/sdxl"
inputs = {"prompt": "A beautiful sunset over mountains"}
output = replicate.run(model_name, input=inputs)
print("Generated image URLs:", output)

Run with custom version of a model ›

Use when you want to specify a particular model version or commit hash.

python

import os
import replicate

model_version = "meta/meta-llama-3-8b-instruct:1234567890abcdef"
inputs = {"prompt": "Custom version prompt", "max_tokens": 128}
output = replicate.run(model_version, input=inputs)
print("Output from custom version:", output)

Performance

Latency~1-5 seconds depending on model complexity and input size

CostVaries by model; check Replicate pricing per model usage

Rate limitsDepends on your Replicate account tier; typically a few hundred requests per minute

Limit <code>max_tokens</code> to reduce cost and latency
Use concise prompts to minimize input size
Cache frequent outputs to avoid repeated calls

Approach	Latency	Cost/call	Best for
Synchronous replicate.run()	~1-5s	Model-dependent	Simple scripts and blocking calls
Asynchronous replicate.async_run()	~1-5s	Model-dependent	Concurrent calls and async apps
Specifying model version	~1-5s	Model-dependent	Reproducible results with fixed model versions

✓

Quick tip

Always set your <code>REPLICATE_API_TOKEN</code> in the environment before running replicate calls to avoid authentication errors.

⚠

Common mistake

Beginners often forget to set the <code>REPLICATE_API_TOKEN</code> environment variable, causing authentication failures.

Verified 2026-04 · meta/meta-llama-3-8b-instruct, stability-ai/sdxl

Verify ↗