How to beginner · 3 min read

Replicate pricing

Quick answer
Replicate pricing is usage-based, charging per compute second or per inference depending on the model. Some models offer free usage tiers, but most require payment via credits purchased on the Replicate platform. Use the replicate Python package with your API token set in REPLICATE_API_TOKEN to manage usage and costs.

PREREQUISITES

  • Python 3.8+
  • Replicate API token (set REPLICATE_API_TOKEN environment variable)
  • pip install replicate

Setup

Install the official replicate Python package and set your API token as an environment variable to authenticate requests.

bash
pip install replicate
output
Collecting replicate
  Downloading replicate-0.10.0-py3-none-any.whl (30 kB)
Installing collected packages: replicate
Successfully installed replicate-0.10.0

Step by step

Use the replicate package to run a model and monitor usage. Pricing depends on the model and is billed per second of GPU or CPU usage. Free usage is limited and varies by model.

python
import os
import replicate

# Ensure your API token is set in the environment
# export REPLICATE_API_TOKEN="your_token_here"

client = replicate.Client()

# Run a model (example: meta-llama-3-8b-instruct)
output = client.run(
    "meta/meta-llama-3-8b-instruct",
    input={"prompt": "Hello, how are you?", "max_tokens": 50}
)

print("Model output:", output)

# Note: Pricing is based on compute time used by the model run.
output
Model output: Hello! I'm doing well, thank you. How can I assist you today?

Common variations

You can run image generation models or other types by changing the model name and input parameters. Async usage is supported via asyncio with the replicate package. Pricing varies by model type (text, image, video) and compute resources used.

python
import asyncio
import os
import replicate

async def main():
    client = replicate.Client()
    output = await client.async_run(
        "stability-ai/stable-diffusion",
        input={"prompt": "A futuristic cityscape at sunset"}
    )
    print("Image URL:", output[0])

asyncio.run(main())
output
Image URL: https://replicate.delivery/your-generated-image.png

Troubleshooting

  • If you get authentication errors, verify your REPLICATE_API_TOKEN is set correctly.
  • For quota exceeded errors, check your usage dashboard on Replicate and consider upgrading your plan or reducing usage.
  • Model-specific errors may require checking the model documentation on Replicate.

Key Takeaways

  • Replicate pricing is usage-based, billed per compute second or inference.
  • Set your API token in the environment variable REPLICATE_API_TOKEN before using the SDK.
  • Pricing varies by model type and compute resources; check Replicate's dashboard for detailed costs.
  • Use the official replicate Python package for easy integration and usage tracking.
Verified 2026-04 · meta/meta-llama-3-8b-instruct, stability-ai/stable-diffusion
Verify ↗