How to beginner · 3 min read

Replicate cost per prediction

Quick answer

Replicate charges per prediction based on the compute resources and model used, with costs varying widely by model and provider. Pricing is typically metered per inference or per token for language models, and detailed cost info is available on each model's page on replicate.com. To estimate cost, check the model's pricing section and monitor usage via the API.

PREREQUISITES

Python 3.8+
REPLICATE_API_TOKEN environment variable set
pip install replicate

Setup

Install the official replicate Python package and set your API token as an environment variable for authentication.

bash

pip install replicate

output

Collecting replicate
  Downloading replicate-0.10.0-py3-none-any.whl (30 kB)
Installing collected packages: replicate
Successfully installed replicate-0.10.0

Step by step

Use the replicate Python client to run a model prediction and understand cost implications by monitoring usage. Replicate bills per prediction based on the model's compute requirements.

python

import os
import replicate

# Ensure your API token is set in the environment
# export REPLICATE_API_TOKEN="your_token_here"

client = replicate.Client(api_token=os.environ["REPLICATE_API_TOKEN"])

# Example: Run the meta-llama-3-8b-instruct model
output = client.run(
    "meta/meta-llama-3-8b-instruct",
    input={"prompt": "Hello, how are you?", "max_tokens": 50}
)

print("Model output:", output)

# Note: Cost depends on model and usage; check replicate.com pricing for details.

output

Model output: Hello! I'm doing well, thank you. How can I assist you today?

Common variations

You can run different models by changing the model name string in client.run(). Some models charge per token, others per inference. Async usage is supported with await client.async_run() in async contexts.

python

import asyncio
import os
import replicate

async def async_example():
    client = replicate.Client(api_token=os.environ["REPLICATE_API_TOKEN"])
    output = await client.async_run(
        "stability-ai/stable-diffusion",
        input={"prompt": "A futuristic cityscape at sunset"}
    )
    print("Async model output URL:", output)

asyncio.run(async_example())

output

Async model output URL: https://replicate.delivery/abcd1234.png

Troubleshooting

If you get authentication errors, verify your REPLICATE_API_TOKEN is set correctly.
For quota or billing issues, check your Replicate dashboard usage and limits.
Model-specific errors often indicate input format issues; consult the model's documentation on replicate.com.

✅

Key Takeaways

Replicate pricing varies by model and is charged per prediction or token usage.
Always check the specific model's pricing page on replicate.com before usage.
Use the official replicate Python client with your API token for authenticated calls.

Verified 2026-04 · meta/meta-llama-3-8b-instruct, stability-ai/stable-diffusion

Verify ↗