Replicate cost per prediction
Quick answer
Replicate charges per prediction based on the compute resources and model used, with costs varying widely by model and provider. Pricing is typically metered per inference or per token for language models, and detailed cost info is available on each model's page on replicate.com. To estimate cost, check the model's pricing section and monitor usage via the API.
PREREQUISITES
Python 3.8+REPLICATE_API_TOKEN environment variable setpip install replicate
Setup
Install the official replicate Python package and set your API token as an environment variable for authentication.
pip install replicate output
Collecting replicate Downloading replicate-0.10.0-py3-none-any.whl (30 kB) Installing collected packages: replicate Successfully installed replicate-0.10.0
Step by step
Use the replicate Python client to run a model prediction and understand cost implications by monitoring usage. Replicate bills per prediction based on the model's compute requirements.
import os
import replicate
# Ensure your API token is set in the environment
# export REPLICATE_API_TOKEN="your_token_here"
client = replicate.Client(api_token=os.environ["REPLICATE_API_TOKEN"])
# Example: Run the meta-llama-3-8b-instruct model
output = client.run(
"meta/meta-llama-3-8b-instruct",
input={"prompt": "Hello, how are you?", "max_tokens": 50}
)
print("Model output:", output)
# Note: Cost depends on model and usage; check replicate.com pricing for details. output
Model output: Hello! I'm doing well, thank you. How can I assist you today?
Common variations
You can run different models by changing the model name string in client.run(). Some models charge per token, others per inference. Async usage is supported with await client.async_run() in async contexts.
import asyncio
import os
import replicate
async def async_example():
client = replicate.Client(api_token=os.environ["REPLICATE_API_TOKEN"])
output = await client.async_run(
"stability-ai/stable-diffusion",
input={"prompt": "A futuristic cityscape at sunset"}
)
print("Async model output URL:", output)
asyncio.run(async_example()) output
Async model output URL: https://replicate.delivery/abcd1234.png
Troubleshooting
- If you get authentication errors, verify your
REPLICATE_API_TOKENis set correctly. - For quota or billing issues, check your Replicate dashboard usage and limits.
- Model-specific errors often indicate input format issues; consult the model's documentation on
replicate.com.
Key Takeaways
- Replicate pricing varies by model and is charged per prediction or token usage.
- Always check the specific model's pricing page on replicate.com before usage.
- Use the official replicate Python client with your API token for authenticated calls.