How to Intermediate · 3 min read

How to use parallel function calling in OpenAI

Quick answer

Use the OpenAI Python SDK to send multiple chat completion requests concurrently with asyncio or threading. Each request can specify functions and function_call parameters to invoke structured function calls in parallel, improving throughput and latency.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0
Basic knowledge of Python asyncio or concurrent.futures

Setup

Install the latest OpenAI Python SDK and set your API key as an environment variable.

Run pip install openai to install the SDK.
Set your API key in your shell: export OPENAI_API_KEY='your_api_key_here' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key_here" (Windows).

bash

pip install openai

Step by step

This example demonstrates how to call multiple OpenAI chat completions with function calling in parallel using asyncio. Each call requests a function execution, and results are gathered concurrently.

python

import os
import asyncio
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Define the function schema for OpenAI function calling
functions = [
    {
        "name": "get_current_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["location"]
        }
    }
]

async def call_function(location: str):
    response = await client.chat.completions.acreate(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": f"What is the weather like in {location}?"}],
        functions=functions,
        function_call="auto"
    )
    message = response.choices[0].message
    return {
        "location": location,
        "function_call": message.function_call if hasattr(message, "function_call") else None,
        "content": message.content
    }

async def main():
    locations = ["New York, NY", "Los Angeles, CA", "Chicago, IL"]
    tasks = [call_function(loc) for loc in locations]
    results = await asyncio.gather(*tasks)
    for result in results:
        print(f"Location: {result['location']}")
        print(f"Function call: {result['function_call']}")
        print(f"Content: {result['content']}\n")

if __name__ == "__main__":
    asyncio.run(main())

output

Location: New York, NY
Function call: {'name': 'get_current_weather', 'arguments': '{"location": "New York, NY", "unit": "fahrenheit"}'}
Content: 

Location: Los Angeles, CA
Function call: {'name': 'get_current_weather', 'arguments': '{"location": "Los Angeles, CA", "unit": "fahrenheit"}'}
Content: 

Location: Chicago, IL
Function call: {'name': 'get_current_weather', 'arguments': '{"location": "Chicago, IL", "unit": "fahrenheit"}'}
Content:

Common variations

You can also use concurrent.futures.ThreadPoolExecutor for parallel calls in synchronous code. Change the model to other OpenAI models like gpt-4.1 or gpt-4o-mini. For streaming responses, use the stream=True parameter with async iteration.

python

import os
from concurrent.futures import ThreadPoolExecutor
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

functions = [
    {
        "name": "get_current_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["location"]
        }
    }
]

def call_function(location: str):
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": f"What is the weather like in {location}?"}],
        functions=functions,
        function_call="auto"
    )
    message = response.choices[0].message
    return {
        "location": location,
        "function_call": message.function_call if hasattr(message, "function_call") else None,
        "content": message.content
    }

locations = ["Miami, FL", "Seattle, WA", "Denver, CO"]

with ThreadPoolExecutor() as executor:
    results = list(executor.map(call_function, locations))

for result in results:
    print(f"Location: {result['location']}")
    print(f"Function call: {result['function_call']}")
    print(f"Content: {result['content']}\n")

output

Location: Miami, FL
Function call: {'name': 'get_current_weather', 'arguments': '{"location": "Miami, FL", "unit": "fahrenheit"}'}
Content: 

Location: Seattle, WA
Function call: {'name': 'get_current_weather', 'arguments': '{"location": "Seattle, WA", "unit": "fahrenheit"}'}
Content: 

Location: Denver, CO
Function call: {'name': 'get_current_weather', 'arguments': '{"location": "Denver, CO", "unit": "fahrenheit"}'}
Content:

Troubleshooting

If you get RateLimitError, reduce concurrency or add retry logic with exponential backoff.
If function_call is missing, verify your functions parameter is correctly formatted and the model supports function calling.
For authentication errors, ensure OPENAI_API_KEY is set correctly in your environment.

✅

Key Takeaways

Use Python's asyncio or ThreadPoolExecutor to call OpenAI chat completions with function calling in parallel.
Specify functions and function_call parameters in each request to enable structured function calls.
Handle rate limits by controlling concurrency and implementing retries.
Parallel calls improve throughput and reduce latency when invoking multiple function calls.
Always use environment variables for API keys and the latest OpenAI SDK v1+ syntax.

Verified 2026-04 · gpt-4o-mini, gpt-4.1

Verify ↗