How to use async calls to speed up AI apps
Quick answer
Use Python's
asyncio with async SDK methods like client.chat.completions.acreate() to run multiple AI requests concurrently, reducing total wait time. This speeds up AI apps by overlapping network calls instead of waiting sequentially.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0
Setup
Install the OpenAI Python SDK and set your API key as an environment variable.
- Run
pip install openai - Set
export OPENAI_API_KEY='your_api_key'on Linux/macOS orsetx OPENAI_API_KEY "your_api_key"on Windows
pip install openai Step by step
Use Python's asyncio and the OpenAI SDK's async method acreate() to send multiple requests concurrently. This example sends two prompts in parallel and prints their responses.
import os
import asyncio
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
async def fetch_response(prompt):
response = await client.chat.completions.acreate(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.content
async def main():
prompts = ["Hello, AI!", "What is async programming?"]
tasks = [fetch_response(p) for p in prompts]
results = await asyncio.gather(*tasks)
for i, res in enumerate(results):
print(f"Response {i+1}: {res}")
if __name__ == "__main__":
asyncio.run(main()) output
Response 1: Hello! How can I assist you today? Response 2: Async programming allows tasks to run concurrently, improving efficiency.
Common variations
You can use async calls with other models like claude-3-5-sonnet-20241022 from Anthropic or stream responses asynchronously. Also, combine async calls with batching or rate limiting for large-scale apps.
import os
import asyncio
import anthropic
client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
async def fetch_claude_response(prompt):
message = await client.messages.acreate(
model="claude-3-5-sonnet-20241022",
max_tokens=200,
system="You are a helpful assistant.",
messages=[{"role": "user", "content": prompt}]
)
return message.content[0].text
async def main():
prompts = ["Explain async in simple terms.", "Give me a joke."]
tasks = [fetch_claude_response(p) for p in prompts]
results = await asyncio.gather(*tasks)
for i, res in enumerate(results):
print(f"Claude Response {i+1}: {res}")
if __name__ == "__main__":
asyncio.run(main()) output
Claude Response 1: Async means doing multiple things at once without waiting for each to finish. Claude Response 2: Why did the AI go to school? To improve its neural network!
Troubleshooting
If you get RuntimeError: This event loop is already running, it means your environment (like Jupyter) conflicts with asyncio.run(). Use nest_asyncio or run async code differently. Also, handle API rate limits by catching exceptions and retrying with backoff.
import nest_asyncio
nest_asyncio.apply() Key Takeaways
- Use async SDK methods like
acreate()to send multiple AI requests concurrently. - Python's
asyncio.gather()efficiently runs multiple async calls in parallel. - Async calls reduce total latency by overlapping network waits instead of sequential blocking.