How to Intermediate · 3 min read

How to use async calls to speed up AI apps

Q: How to use async calls to speed up AI apps

Use Python's asyncio with async SDK methods like client.chat.completions.acreate() to run multiple AI requests concurrently, reducing total wait time. This speeds up AI apps by overlapping network calls instead of waiting sequentially.

Quick answer

Use Python's asyncio with async SDK methods like client.chat.completions.acreate() to run multiple AI requests concurrently, reducing total wait time. This speeds up AI apps by overlapping network calls instead of waiting sequentially.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the OpenAI Python SDK and set your API key as an environment variable.

Run pip install openai
Set export OPENAI_API_KEY='your_api_key' on Linux/macOS or setx OPENAI_API_KEY "your_api_key" on Windows

bash

pip install openai

Step by step

Use Python's asyncio and the OpenAI SDK's async method acreate() to send multiple requests concurrently. This example sends two prompts in parallel and prints their responses.

python

import os
import asyncio
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def fetch_response(prompt):
    response = await client.chat.completions.acreate(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

async def main():
    prompts = ["Hello, AI!", "What is async programming?"]
    tasks = [fetch_response(p) for p in prompts]
    results = await asyncio.gather(*tasks)
    for i, res in enumerate(results):
        print(f"Response {i+1}: {res}")

if __name__ == "__main__":
    asyncio.run(main())

output

Response 1: Hello! How can I assist you today?
Response 2: Async programming allows tasks to run concurrently, improving efficiency.

Common variations

You can use async calls with other models like claude-3-5-sonnet-20241022 from Anthropic or stream responses asynchronously. Also, combine async calls with batching or rate limiting for large-scale apps.

python

import os
import asyncio
import anthropic

client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

async def fetch_claude_response(prompt):
    message = await client.messages.acreate(
        model="claude-3-5-sonnet-20241022",
        max_tokens=200,
        system="You are a helpful assistant.",
        messages=[{"role": "user", "content": prompt}]
    )
    return message.content[0].text

async def main():
    prompts = ["Explain async in simple terms.", "Give me a joke."]
    tasks = [fetch_claude_response(p) for p in prompts]
    results = await asyncio.gather(*tasks)
    for i, res in enumerate(results):
        print(f"Claude Response {i+1}: {res}")

if __name__ == "__main__":
    asyncio.run(main())

output

Claude Response 1: Async means doing multiple things at once without waiting for each to finish.
Claude Response 2: Why did the AI go to school? To improve its neural network!

Troubleshooting

If you get RuntimeError: This event loop is already running, it means your environment (like Jupyter) conflicts with asyncio.run(). Use nest_asyncio or run async code differently. Also, handle API rate limits by catching exceptions and retrying with backoff.

python

import nest_asyncio
nest_asyncio.apply()

✅

Key Takeaways

Use async SDK methods like acreate() to send multiple AI requests concurrently.
Python's asyncio.gather() efficiently runs multiple async calls in parallel.
Async calls reduce total latency by overlapping network waits instead of sequential blocking.

Verified 2026-04 · gpt-4o, claude-3-5-sonnet-20241022

Verify ↗