How to use parallel tool calling in OpenAI
Quick answer
Use Python's concurrency libraries like
asyncio or concurrent.futures to call OpenAI's chat.completions.create method in parallel. This enables multiple tool calls simultaneously, improving throughput and reducing latency.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0
Setup
Install the official OpenAI Python SDK and set your API key as an environment variable.
- Run
pip install openaito install the SDK. - Set your API key in your shell:
export OPENAI_API_KEY='your_api_key_here'(Linux/macOS) orsetx OPENAI_API_KEY "your_api_key_here"(Windows).
pip install openai Step by step
This example demonstrates how to perform parallel tool calls to OpenAI's chat completions endpoint using concurrent.futures.ThreadPoolExecutor. Each call sends a different prompt concurrently and collects the responses.
import os
from openai import OpenAI
from concurrent.futures import ThreadPoolExecutor, as_completed
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
prompts = [
"Translate 'Hello, world!' to French.",
"Summarize the plot of 'The Great Gatsby'.",
"Generate a haiku about spring.",
"Explain the theory of relativity in simple terms."
]
def call_openai(prompt):
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.content
results = []
with ThreadPoolExecutor(max_workers=4) as executor:
futures = [executor.submit(call_openai, prompt) for prompt in prompts]
for future in as_completed(futures):
results.append(future.result())
for i, output in enumerate(results, 1):
print(f"Response {i}: {output}\n") output
Response 1: Bonjour le monde! Response 2: The Great Gatsby is a novel about the mysterious millionaire Jay Gatsby and his obsession with Daisy Buchanan, exploring themes of wealth, love, and the American Dream. Response 3: Spring whispers softly, Cherry blossoms gently fall, New life wakes the earth. Response 4: The theory of relativity explains how space and time are linked and how gravity affects them, showing that time can slow down near massive objects.
Common variations
You can also use asyncio with the OpenAI SDK's async support for parallel calls. Alternatively, adjust max_workers for more concurrency or switch models like gpt-4o-mini for faster, cheaper calls.
import asyncio
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
async def call_openai_async(prompt):
response = await client.chat.completions.acreate(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.content
async def main():
prompts = [
"Write a limerick about cats.",
"Explain quantum computing simply.",
"List three benefits of meditation.",
"Translate 'Good morning' to Spanish."
]
tasks = [call_openai_async(p) for p in prompts]
results = await asyncio.gather(*tasks)
for i, res in enumerate(results, 1):
print(f"Async Response {i}: {res}\n")
if __name__ == "__main__":
asyncio.run(main()) output
Async Response 1: There once was a cat with a hat... Async Response 2: Quantum computing uses quantum bits that can be both 0 and 1 at the same time, allowing complex problems to be solved faster. Async Response 3: Meditation reduces stress, improves focus, and enhances emotional health. Async Response 4: Buenos días
Troubleshooting
- If you encounter rate limit errors, reduce concurrency by lowering
max_workersor add retry logic with exponential backoff. - Ensure your API key is correctly set in
os.environ["OPENAI_API_KEY"]. - For network timeouts, increase timeout settings or check your internet connection.
Key Takeaways
- Use Python concurrency tools like ThreadPoolExecutor or asyncio to call OpenAI APIs in parallel.
- Parallel calls reduce total latency when running multiple independent prompts or tools.
- Adjust concurrency levels to avoid rate limits and optimize throughput.
- OpenAI Python SDK supports async calls for efficient parallelism.
- Always secure your API key via environment variables to avoid leaks.