How to beginner · 3 min read

Replicate async predictions

Q: Replicate async predictions

Use the replicate.async_run function to perform asynchronous predictions with the Replicate API in Python. This allows non-blocking calls to models by awaiting the result, enabling efficient concurrency in your applications.

Quick answer

Use the replicate.async_run function to perform asynchronous predictions with the Replicate API in Python. This allows non-blocking calls to models by awaiting the result, enabling efficient concurrency in your applications.

PREREQUISITES

Python 3.8+
Replicate API token set in environment variable REPLICATE_API_TOKEN
pip install replicate

Setup

Install the replicate Python package and set your API token as an environment variable for authentication.

bash

pip install replicate

output

Collecting replicate
  Downloading replicate-0.10.0-py3-none-any.whl (30 kB)
Installing collected packages: replicate
Successfully installed replicate-0.10.0

Step by step

Use replicate.async_run with asyncio to run a model prediction asynchronously. This example calls the meta/meta-llama-3-8b-instruct model with a prompt and prints the output.

python

import os
import asyncio
import replicate

async def main():
    output = await replicate.async_run(
        "meta/meta-llama-3-8b-instruct",
        input={"prompt": "Hello, how are you?", "max_tokens": 50}
    )
    print("Model output:", output)

if __name__ == "__main__":
    # Ensure your REPLICATE_API_TOKEN is set in environment variables
    asyncio.run(main())

output

Model output: Hello! I'm doing well, thank you. How can I assist you today?

Common variations

Use replicate.run for synchronous calls if async is not needed.
Call different models by changing the model name string.
Combine multiple async calls with asyncio.gather for concurrency.

python

import asyncio
import replicate

async def call_multiple():
    tasks = [
        replicate.async_run("meta/meta-llama-3-8b-instruct", input={"prompt": f"Say hello {i}", "max_tokens": 20})
        for i in range(3)
    ]
    results = await asyncio.gather(*tasks)
    for i, res in enumerate(results):
        print(f"Output {i}:", res)

if __name__ == "__main__":
    asyncio.run(call_multiple())

output

Output 0: Hello 0!
Output 1: Hello 1!
Output 2: Hello 2!

Troubleshooting

If you get authentication errors, verify your REPLICATE_API_TOKEN environment variable is set correctly.
For network timeouts, check your internet connection and retry.
Ensure your Python version is 3.8 or higher to support asyncio.run.

✅

Key Takeaways

Use replicate.async_run with await for non-blocking model predictions.
Set REPLICATE_API_TOKEN in your environment for authentication.
Combine multiple async calls with asyncio.gather for concurrent predictions.

Verified 2026-04 · meta/meta-llama-3-8b-instruct

Verify ↗