Code beginner · 3 min read

How to use Replicate API in Python

Direct answer

Use the replicate Python package with your REPLICATE_API_TOKEN environment variable set, then call replicate.run() with the model name and input parameters to get AI model outputs.

Setup

Install

bash

pip install replicate

Env vars

REPLICATE_API_TOKEN

Imports

python

import os
import replicate

Examples

inGenerate text completion for prompt 'Hello, world!' using meta/meta-llama-3-8b-instruct

outHello, world! How can I assist you today?

inGenerate an image with prompt 'A futuristic cityscape at sunset' using stability-ai/sdxl

out['https://replicate.delivery/pbxt/abc12345-fake-url.png']

inRun model with empty prompt input

outError or empty output depending on model requirements

Integration steps

Install the replicate package via pip.
Set your API token in the environment variable REPLICATE_API_TOKEN.
Import replicate and os modules in your Python script.
Call replicate.run() with the model identifier and input parameters as a dictionary.
Capture and process the returned output from the model.
Handle exceptions or errors if the model input is invalid or API call fails.

Full code

python

import os
import replicate

# Ensure your REPLICATE_API_TOKEN is set in environment variables
api_token = os.environ.get("REPLICATE_API_TOKEN")
if not api_token:
    raise ValueError("Set the REPLICATE_API_TOKEN environment variable.")

# Example: Run the meta-llama-3-8b-instruct model to generate text
model_name = "meta/meta-llama-3-8b-instruct"
prompt = "Hello, world!"

try:
    output = replicate.run(
        model_name,
        input={"prompt": prompt, "max_tokens": 512}
    )
    print("Model output:", output)
except Exception as e:
    print("Error calling Replicate API:", e)

output

Model output: Hello, world! How can I assist you today?

API trace

Request

json

{"model": "meta/meta-llama-3-8b-instruct", "input": {"prompt": "Hello, world!", "max_tokens": 512}}

Response

json

{"output": "Hello, world! How can I assist you today?", "version": "v1", "logs": "..."}

Extractoutput = replicate.run(model_name, input={...})

Variants

Async version ›

Use when you want to perform concurrent or non-blocking calls to Replicate models in async Python applications.

python

import os
import replicate
import asyncio

async def main():
    api_token = os.environ.get("REPLICATE_API_TOKEN")
    if not api_token:
        raise ValueError("Set the REPLICATE_API_TOKEN environment variable.")

    model_name = "meta/meta-llama-3-8b-instruct"
    prompt = "Hello, async world!"

    output = await replicate.async_run(
        model_name,
        input={"prompt": prompt, "max_tokens": 512}
    )
    print("Async model output:", output)

asyncio.run(main())

Image generation example ›

Use this pattern when calling image generation models on Replicate.

python

import os
import replicate

api_token = os.environ.get("REPLICATE_API_TOKEN")
if not api_token:
    raise ValueError("Set the REPLICATE_API_TOKEN environment variable.")

model_name = "stability-ai/sdxl"
prompt = "A futuristic cityscape at sunset"

output = replicate.run(
    model_name,
    input={"prompt": prompt}
)
print("Generated image URLs:", output)

Performance

Latency~1-5 seconds depending on model complexity and server load

CostVaries by model; check Replicate pricing per model usage

Rate limitsDepends on your Replicate account plan; typically several requests per minute

Limit max_tokens or equivalent input parameters to reduce cost.
Reuse model versions to avoid overhead of loading new versions.
Batch inputs if supported by the model to optimize throughput.

Approach	Latency	Cost/call	Best for
Synchronous replicate.run()	~1-5s	Model-dependent	Simple scripts and blocking calls
Asynchronous replicate.async_run()	~1-5s	Model-dependent	Concurrent calls in async apps
Image generation with replicate.run()	~3-10s	Higher for large models	Generating images or media

✓

Quick tip

Always set your REPLICATE_API_TOKEN in the environment and use replicate.run() with model and input dict for simple, direct calls.

⚠

Common mistake

Beginners often forget to set the REPLICATE_API_TOKEN environment variable, causing authentication errors.

Verified 2026-04 · meta/meta-llama-3-8b-instruct, stability-ai/sdxl

Verify ↗