Code intermediate · 3 min read

How to use async OpenAI API in python

Direct answer

Use the official OpenAI Python SDK's async client methods by importing OpenAI and calling await client.chat.completions.acreate() within an async function.

Setup

Install

bash

pip install openai

Env vars

OPENAI_API_KEY

Imports

python

import os
import asyncio
from openai import OpenAI

Examples

inSend a simple async chat completion request with prompt 'Hello, async world!'

outAssistant: Hello, async world! How can I assist you today?

inMake two concurrent async requests to generate greetings for 'Alice' and 'Bob'

outAssistant 1: Hello Alice! How can I help? Assistant 2: Hi Bob! What can I do for you?

inHandle an empty message list in async call

outError or empty response handled gracefully

Integration steps

Install the OpenAI Python SDK and set your API key in the environment variable OPENAI_API_KEY
Import OpenAI and asyncio in your Python script
Create an async function and instantiate the OpenAI client with the API key
Use await client.chat.completions.acreate() to send async requests with the model and messages
Run the async function using asyncio.run() and process the response from response.choices[0].message.content

Full code

python

import os
import asyncio
from openai import OpenAI

async def main():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    response = await client.chat.completions.acreate(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello, async world!"}]
    )
    print("Assistant:", response.choices[0].message.content)

if __name__ == "__main__":
    asyncio.run(main())

output

Assistant: Hello, async world! How can I assist you today?

API trace

Request

json

{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello, async world!"}]}

Response

json

{"choices": [{"message": {"content": "Hello, async world! How can I assist you today?"}}], "usage": {"total_tokens": 15}}

Extractresponse.choices[0].message.content

Variants

Streaming async chat completion ›

Use streaming to display partial results in real-time for better user experience with long responses.

python

import os
import asyncio
from openai import OpenAI

async def main():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    stream = await client.chat.completions.acreate(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Stream this response."}],
        stream=True
    )
    async for chunk in stream:
        print(chunk.choices[0].delta.get("content", ""), end="", flush=True)

if __name__ == "__main__":
    asyncio.run(main())

Concurrent multiple async requests ›

Use concurrent async calls to handle multiple independent requests efficiently.

python

import os
import asyncio
from openai import OpenAI

async def fetch_response(client, prompt):
    response = await client.chat.completions.acreate(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

async def main():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    prompts = ["Hello Alice!", "Hi Bob!"]
    tasks = [fetch_response(client, p) for p in prompts]
    results = await asyncio.gather(*tasks)
    for i, res in enumerate(results):
        print(f"Assistant {i+1}: {res}")

if __name__ == "__main__":
    asyncio.run(main())

Use smaller model for faster async calls ›

Use smaller models like gpt-4o-mini for lower latency and cost when high fidelity is not critical.

python

import os
import asyncio
from openai import OpenAI

async def main():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    response = await client.chat.completions.acreate(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Quick async test."}]
    )
    print("Assistant:", response.choices[0].message.content)

if __name__ == "__main__":
    asyncio.run(main())

Performance

Latency~800ms for gpt-4o non-streaming async calls

Cost~$0.002 per 500 tokens for gpt-4o

Rate limitsTier 1: 500 requests per minute / 30,000 tokens per minute

Keep prompts concise to reduce token usage
Use smaller models like gpt-4o-mini for cheaper calls
Cache frequent responses to avoid repeated calls

Approach	Latency	Cost/call	Best for
Async non-streaming	~800ms	~$0.002	Concurrent calls with full response
Async streaming	Starts within ~300ms	~$0.002	Real-time partial output display
Async with smaller model	~400ms	~$0.001	Faster, cheaper, less complex tasks

✓

Quick tip

Always use <code>await</code> with async client methods and run them inside an <code>async</code> function to avoid blocking your event loop.

⚠

Common mistake

Calling async methods without <code>await</code> or outside an async function, causing runtime errors or no response.

Verified 2026-04 · gpt-4o, gpt-4o-mini

Verify ↗