Replicate async predictions
Quick answer
Use the
replicate.async_run function to perform asynchronous predictions with the Replicate API in Python. This allows non-blocking calls to models by awaiting the result, enabling efficient concurrency in your applications.PREREQUISITES
Python 3.8+Replicate API token set in environment variable REPLICATE_API_TOKENpip install replicate
Setup
Install the replicate Python package and set your API token as an environment variable for authentication.
pip install replicate output
Collecting replicate Downloading replicate-0.10.0-py3-none-any.whl (30 kB) Installing collected packages: replicate Successfully installed replicate-0.10.0
Step by step
Use replicate.async_run with asyncio to run a model prediction asynchronously. This example calls the meta/meta-llama-3-8b-instruct model with a prompt and prints the output.
import os
import asyncio
import replicate
async def main():
output = await replicate.async_run(
"meta/meta-llama-3-8b-instruct",
input={"prompt": "Hello, how are you?", "max_tokens": 50}
)
print("Model output:", output)
if __name__ == "__main__":
# Ensure your REPLICATE_API_TOKEN is set in environment variables
asyncio.run(main()) output
Model output: Hello! I'm doing well, thank you. How can I assist you today?
Common variations
- Use
replicate.runfor synchronous calls if async is not needed. - Call different models by changing the model name string.
- Combine multiple async calls with
asyncio.gatherfor concurrency.
import asyncio
import replicate
async def call_multiple():
tasks = [
replicate.async_run("meta/meta-llama-3-8b-instruct", input={"prompt": f"Say hello {i}", "max_tokens": 20})
for i in range(3)
]
results = await asyncio.gather(*tasks)
for i, res in enumerate(results):
print(f"Output {i}:", res)
if __name__ == "__main__":
asyncio.run(call_multiple()) output
Output 0: Hello 0! Output 1: Hello 1! Output 2: Hello 2!
Troubleshooting
- If you get authentication errors, verify your
REPLICATE_API_TOKENenvironment variable is set correctly. - For network timeouts, check your internet connection and retry.
- Ensure your Python version is 3.8 or higher to support
asyncio.run.
Key Takeaways
- Use
replicate.async_runwithawaitfor non-blocking model predictions. - Set
REPLICATE_API_TOKENin your environment for authentication. - Combine multiple async calls with
asyncio.gatherfor concurrent predictions.