How to make concurrent OpenAI API calls in python
Quick answer
Use Python's
asyncio with the OpenAI SDK's async client methods to make concurrent API calls. This approach allows multiple client.chat.completions.acreate requests to run in parallel, improving throughput and reducing latency.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0
Setup
Install the official OpenAI Python SDK version 1.0 or higher and set your API key as an environment variable.
- Run
pip install openai>=1.0 - Set your API key in your shell:
export OPENAI_API_KEY='your_api_key_here'
pip install openai>=1.0 Step by step
This example demonstrates making multiple concurrent chat completion requests using asyncio and the OpenAI SDK's async client.
import os
import asyncio
from openai import OpenAI
async def fetch_response(client, prompt):
response = await client.chat.completions.acreate(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.content
async def main():
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
prompts = [
"Hello, how are you?",
"What is the capital of France?",
"Explain concurrency in Python.",
"Write a haiku about AI.",
"Summarize the benefits of async programming."
]
tasks = [fetch_response(client, prompt) for prompt in prompts]
results = await asyncio.gather(*tasks)
for i, result in enumerate(results, 1):
print(f"Response {i}: {result}\n")
if __name__ == "__main__":
asyncio.run(main()) output
Response 1: I'm doing well, thank you! How can I assist you today? Response 2: The capital of France is Paris. Response 3: Concurrency in Python allows multiple tasks to run seemingly at the same time, improving efficiency, especially for I/O-bound operations. Response 4: AI whispers soft, Learning minds in silent code, Future dawns anew. Response 5: Async programming boosts performance by enabling tasks to run without waiting, reducing idle time and improving responsiveness.
Common variations
You can also use synchronous calls with concurrent.futures.ThreadPoolExecutor for concurrency, or switch models by changing the model parameter. For streaming responses, use the SDK's streaming interface with async iteration.
import os
import concurrent.futures
from openai import OpenAI
def fetch_sync(client, prompt):
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.content
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
prompts = ["Hello", "Explain async", "Write a poem"]
with concurrent.futures.ThreadPoolExecutor() as executor:
futures = [executor.submit(fetch_sync, client, prompt) for prompt in prompts]
for future in concurrent.futures.as_completed(futures):
print(future.result()) output
Hello! How can I help you today? Async programming allows... Here is a poem for you...
Troubleshooting
- If you get
RateLimitError, reduce concurrency or add retry logic with exponential backoff. - Ensure your API key is correctly set in
os.environ["OPENAI_API_KEY"]. - For
asyncio.run()errors in some environments, use an event loop compatible with your runtime.
Key Takeaways
- Use the OpenAI SDK's async methods with Python's asyncio for efficient concurrent calls.
- Async concurrency reduces latency by running multiple requests in parallel.
- Handle rate limits by controlling concurrency and implementing retries.
- Synchronous concurrency is possible with ThreadPoolExecutor but less efficient for I/O-bound tasks.
- Always secure your API key via environment variables and never hardcode it.