Code intermediate · 3 min read

How to use async OpenAI API in python

Direct answer
Use the official OpenAI Python SDK's async client methods by importing OpenAI and calling await client.chat.completions.acreate() within an async function.

Setup

Install
bash
pip install openai
Env vars
OPENAI_API_KEY
Imports
python
import os
import asyncio
from openai import OpenAI

Examples

inSend a simple async chat completion request with prompt 'Hello, async world!'
outAssistant: Hello, async world! How can I assist you today?
inMake two concurrent async requests to generate greetings for 'Alice' and 'Bob'
outAssistant 1: Hello Alice! How can I help? Assistant 2: Hi Bob! What can I do for you?
inHandle an empty message list in async call
outError or empty response handled gracefully

Integration steps

  1. Install the OpenAI Python SDK and set your API key in the environment variable OPENAI_API_KEY
  2. Import OpenAI and asyncio in your Python script
  3. Create an async function and instantiate the OpenAI client with the API key
  4. Use await client.chat.completions.acreate() to send async requests with the model and messages
  5. Run the async function using asyncio.run() and process the response from response.choices[0].message.content

Full code

python
import os
import asyncio
from openai import OpenAI

async def main():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    response = await client.chat.completions.acreate(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello, async world!"}]
    )
    print("Assistant:", response.choices[0].message.content)

if __name__ == "__main__":
    asyncio.run(main())
output
Assistant: Hello, async world! How can I assist you today?

API trace

Request
json
{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello, async world!"}]}
Response
json
{"choices": [{"message": {"content": "Hello, async world! How can I assist you today?"}}], "usage": {"total_tokens": 15}}
Extractresponse.choices[0].message.content

Variants

Streaming async chat completion

Use streaming to display partial results in real-time for better user experience with long responses.

python
import os
import asyncio
from openai import OpenAI

async def main():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    stream = await client.chat.completions.acreate(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Stream this response."}],
        stream=True
    )
    async for chunk in stream:
        print(chunk.choices[0].delta.get("content", ""), end="", flush=True)

if __name__ == "__main__":
    asyncio.run(main())
Concurrent multiple async requests

Use concurrent async calls to handle multiple independent requests efficiently.

python
import os
import asyncio
from openai import OpenAI

async def fetch_response(client, prompt):
    response = await client.chat.completions.acreate(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

async def main():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    prompts = ["Hello Alice!", "Hi Bob!"]
    tasks = [fetch_response(client, p) for p in prompts]
    results = await asyncio.gather(*tasks)
    for i, res in enumerate(results):
        print(f"Assistant {i+1}: {res}")

if __name__ == "__main__":
    asyncio.run(main())
Use smaller model for faster async calls

Use smaller models like gpt-4o-mini for lower latency and cost when high fidelity is not critical.

python
import os
import asyncio
from openai import OpenAI

async def main():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    response = await client.chat.completions.acreate(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Quick async test."}]
    )
    print("Assistant:", response.choices[0].message.content)

if __name__ == "__main__":
    asyncio.run(main())

Performance

Latency~800ms for gpt-4o non-streaming async calls
Cost~$0.002 per 500 tokens for gpt-4o
Rate limitsTier 1: 500 requests per minute / 30,000 tokens per minute
  • Keep prompts concise to reduce token usage
  • Use smaller models like gpt-4o-mini for cheaper calls
  • Cache frequent responses to avoid repeated calls
ApproachLatencyCost/callBest for
Async non-streaming~800ms~$0.002Concurrent calls with full response
Async streamingStarts within ~300ms~$0.002Real-time partial output display
Async with smaller model~400ms~$0.001Faster, cheaper, less complex tasks

Quick tip

Always use <code>await</code> with async client methods and run them inside an <code>async</code> function to avoid blocking your event loop.

Common mistake

Calling async methods without <code>await</code> or outside an async function, causing runtime errors or no response.

Verified 2026-04 · gpt-4o, gpt-4o-mini
Verify ↗