How to beginner · 3 min read

Replicate async predictions

Quick answer
Use the replicate.async_run function to perform asynchronous predictions with the Replicate API in Python. This allows non-blocking calls to models by awaiting the result, enabling efficient concurrency in your applications.

PREREQUISITES

  • Python 3.8+
  • Replicate API token set in environment variable REPLICATE_API_TOKEN
  • pip install replicate

Setup

Install the replicate Python package and set your API token as an environment variable for authentication.

bash
pip install replicate
output
Collecting replicate
  Downloading replicate-0.10.0-py3-none-any.whl (30 kB)
Installing collected packages: replicate
Successfully installed replicate-0.10.0

Step by step

Use replicate.async_run with asyncio to run a model prediction asynchronously. This example calls the meta/meta-llama-3-8b-instruct model with a prompt and prints the output.

python
import os
import asyncio
import replicate

async def main():
    output = await replicate.async_run(
        "meta/meta-llama-3-8b-instruct",
        input={"prompt": "Hello, how are you?", "max_tokens": 50}
    )
    print("Model output:", output)

if __name__ == "__main__":
    # Ensure your REPLICATE_API_TOKEN is set in environment variables
    asyncio.run(main())
output
Model output: Hello! I'm doing well, thank you. How can I assist you today?

Common variations

  • Use replicate.run for synchronous calls if async is not needed.
  • Call different models by changing the model name string.
  • Combine multiple async calls with asyncio.gather for concurrency.
python
import asyncio
import replicate

async def call_multiple():
    tasks = [
        replicate.async_run("meta/meta-llama-3-8b-instruct", input={"prompt": f"Say hello {i}", "max_tokens": 20})
        for i in range(3)
    ]
    results = await asyncio.gather(*tasks)
    for i, res in enumerate(results):
        print(f"Output {i}:", res)

if __name__ == "__main__":
    asyncio.run(call_multiple())
output
Output 0: Hello 0!
Output 1: Hello 1!
Output 2: Hello 2!

Troubleshooting

  • If you get authentication errors, verify your REPLICATE_API_TOKEN environment variable is set correctly.
  • For network timeouts, check your internet connection and retry.
  • Ensure your Python version is 3.8 or higher to support asyncio.run.

Key Takeaways

  • Use replicate.async_run with await for non-blocking model predictions.
  • Set REPLICATE_API_TOKEN in your environment for authentication.
  • Combine multiple async calls with asyncio.gather for concurrent predictions.
Verified 2026-04 · meta/meta-llama-3-8b-instruct
Verify ↗