How to Intermediate · 3 min read

How to use parallel tool calling in OpenAI

Q: How to use parallel tool calling in OpenAI

Use Python's concurrency libraries like asyncio or concurrent.futures to call OpenAI's chat.completions.create method in parallel. This enables multiple tool calls simultaneously, improving throughput and reducing latency.

Quick answer

Use Python's concurrency libraries like asyncio or concurrent.futures to call OpenAI's chat.completions.create method in parallel. This enables multiple tool calls simultaneously, improving throughput and reducing latency.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the official OpenAI Python SDK and set your API key as an environment variable.

Run pip install openai to install the SDK.
Set your API key in your shell: export OPENAI_API_KEY='your_api_key_here' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key_here" (Windows).

bash

pip install openai

Step by step

This example demonstrates how to perform parallel tool calls to OpenAI's chat completions endpoint using concurrent.futures.ThreadPoolExecutor. Each call sends a different prompt concurrently and collects the responses.

python

import os
from openai import OpenAI
from concurrent.futures import ThreadPoolExecutor, as_completed

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

prompts = [
    "Translate 'Hello, world!' to French.",
    "Summarize the plot of 'The Great Gatsby'.",
    "Generate a haiku about spring.",
    "Explain the theory of relativity in simple terms."
]

def call_openai(prompt):
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

results = []
with ThreadPoolExecutor(max_workers=4) as executor:
    futures = [executor.submit(call_openai, prompt) for prompt in prompts]
    for future in as_completed(futures):
        results.append(future.result())

for i, output in enumerate(results, 1):
    print(f"Response {i}: {output}\n")

output

Response 1: Bonjour le monde!

Response 2: The Great Gatsby is a novel about the mysterious millionaire Jay Gatsby and his obsession with Daisy Buchanan, exploring themes of wealth, love, and the American Dream.

Response 3: Spring whispers softly,
Cherry blossoms gently fall,
New life wakes the earth.

Response 4: The theory of relativity explains how space and time are linked and how gravity affects them, showing that time can slow down near massive objects.

Common variations

You can also use asyncio with the OpenAI SDK's async support for parallel calls. Alternatively, adjust max_workers for more concurrency or switch models like gpt-4o-mini for faster, cheaper calls.

python

import asyncio
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def call_openai_async(prompt):
    response = await client.chat.completions.acreate(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

async def main():
    prompts = [
        "Write a limerick about cats.",
        "Explain quantum computing simply.",
        "List three benefits of meditation.",
        "Translate 'Good morning' to Spanish."
    ]
    tasks = [call_openai_async(p) for p in prompts]
    results = await asyncio.gather(*tasks)
    for i, res in enumerate(results, 1):
        print(f"Async Response {i}: {res}\n")

if __name__ == "__main__":
    asyncio.run(main())

output

Async Response 1: There once was a cat with a hat...

Async Response 2: Quantum computing uses quantum bits that can be both 0 and 1 at the same time, allowing complex problems to be solved faster.

Async Response 3: Meditation reduces stress, improves focus, and enhances emotional health.

Async Response 4: Buenos días

Troubleshooting

If you encounter rate limit errors, reduce concurrency by lowering max_workers or add retry logic with exponential backoff.
Ensure your API key is correctly set in os.environ["OPENAI_API_KEY"].
For network timeouts, increase timeout settings or check your internet connection.

✅

Key Takeaways

Use Python concurrency tools like ThreadPoolExecutor or asyncio to call OpenAI APIs in parallel.
Parallel calls reduce total latency when running multiple independent prompts or tools.
Adjust concurrency levels to avoid rate limits and optimize throughput.
OpenAI Python SDK supports async calls for efficient parallelism.
Always secure your API key via environment variables to avoid leaks.

Verified 2026-04 · gpt-4o, gpt-4o-mini

Verify ↗