Code beginner · 3 min read

How to use Replicate API in Python

Direct answer
Use the replicate Python package with your REPLICATE_API_TOKEN environment variable set, then call replicate.run() with the model name and input parameters to get AI model outputs.

Setup

Install
bash
pip install replicate
Env vars
REPLICATE_API_TOKEN
Imports
python
import os
import replicate

Examples

inGenerate text completion for prompt 'Hello, world!' using meta/meta-llama-3-8b-instruct
outHello, world! How can I assist you today?
inGenerate an image with prompt 'A futuristic cityscape at sunset' using stability-ai/sdxl
out['https://replicate.delivery/pbxt/abc12345-fake-url.png']
inRun model with empty prompt input
outError or empty output depending on model requirements

Integration steps

  1. Install the replicate package via pip.
  2. Set your API token in the environment variable REPLICATE_API_TOKEN.
  3. Import replicate and os modules in your Python script.
  4. Call replicate.run() with the model identifier and input parameters as a dictionary.
  5. Capture and process the returned output from the model.
  6. Handle exceptions or errors if the model input is invalid or API call fails.

Full code

python
import os
import replicate

# Ensure your REPLICATE_API_TOKEN is set in environment variables
api_token = os.environ.get("REPLICATE_API_TOKEN")
if not api_token:
    raise ValueError("Set the REPLICATE_API_TOKEN environment variable.")

# Example: Run the meta-llama-3-8b-instruct model to generate text
model_name = "meta/meta-llama-3-8b-instruct"
prompt = "Hello, world!"

try:
    output = replicate.run(
        model_name,
        input={"prompt": prompt, "max_tokens": 512}
    )
    print("Model output:", output)
except Exception as e:
    print("Error calling Replicate API:", e)
output
Model output: Hello, world! How can I assist you today?

API trace

Request
json
{"model": "meta/meta-llama-3-8b-instruct", "input": {"prompt": "Hello, world!", "max_tokens": 512}}
Response
json
{"output": "Hello, world! How can I assist you today?", "version": "v1", "logs": "..."}
Extractoutput = replicate.run(model_name, input={...})

Variants

Async version

Use when you want to perform concurrent or non-blocking calls to Replicate models in async Python applications.

python
import os
import replicate
import asyncio

async def main():
    api_token = os.environ.get("REPLICATE_API_TOKEN")
    if not api_token:
        raise ValueError("Set the REPLICATE_API_TOKEN environment variable.")

    model_name = "meta/meta-llama-3-8b-instruct"
    prompt = "Hello, async world!"

    output = await replicate.async_run(
        model_name,
        input={"prompt": prompt, "max_tokens": 512}
    )
    print("Async model output:", output)

asyncio.run(main())
Image generation example

Use this pattern when calling image generation models on Replicate.

python
import os
import replicate

api_token = os.environ.get("REPLICATE_API_TOKEN")
if not api_token:
    raise ValueError("Set the REPLICATE_API_TOKEN environment variable.")

model_name = "stability-ai/sdxl"
prompt = "A futuristic cityscape at sunset"

output = replicate.run(
    model_name,
    input={"prompt": prompt}
)
print("Generated image URLs:", output)

Performance

Latency~1-5 seconds depending on model complexity and server load
CostVaries by model; check Replicate pricing per model usage
Rate limitsDepends on your Replicate account plan; typically several requests per minute
  • Limit max_tokens or equivalent input parameters to reduce cost.
  • Reuse model versions to avoid overhead of loading new versions.
  • Batch inputs if supported by the model to optimize throughput.
ApproachLatencyCost/callBest for
Synchronous replicate.run()~1-5sModel-dependentSimple scripts and blocking calls
Asynchronous replicate.async_run()~1-5sModel-dependentConcurrent calls in async apps
Image generation with replicate.run()~3-10sHigher for large modelsGenerating images or media

Quick tip

Always set your REPLICATE_API_TOKEN in the environment and use replicate.run() with model and input dict for simple, direct calls.

Common mistake

Beginners often forget to set the REPLICATE_API_TOKEN environment variable, causing authentication errors.

Verified 2026-04 · meta/meta-llama-3-8b-instruct, stability-ai/sdxl
Verify ↗