How to use parallel function calling in OpenAI
Quick answer
Use the OpenAI Python SDK to send multiple chat completion requests concurrently with
asyncio or threading. Each request can specify functions and function_call parameters to invoke structured function calls in parallel, improving throughput and latency.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0Basic knowledge of Python asyncio or concurrent.futures
Setup
Install the latest OpenAI Python SDK and set your API key as an environment variable.
- Run
pip install openaito install the SDK. - Set your API key in your shell:
export OPENAI_API_KEY='your_api_key_here'(Linux/macOS) orsetx OPENAI_API_KEY "your_api_key_here"(Windows).
pip install openai Step by step
This example demonstrates how to call multiple OpenAI chat completions with function calling in parallel using asyncio. Each call requests a function execution, and results are gathered concurrently.
import os
import asyncio
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Define the function schema for OpenAI function calling
functions = [
{
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location"]
}
}
]
async def call_function(location: str):
response = await client.chat.completions.acreate(
model="gpt-4o-mini",
messages=[{"role": "user", "content": f"What is the weather like in {location}?"}],
functions=functions,
function_call="auto"
)
message = response.choices[0].message
return {
"location": location,
"function_call": message.function_call if hasattr(message, "function_call") else None,
"content": message.content
}
async def main():
locations = ["New York, NY", "Los Angeles, CA", "Chicago, IL"]
tasks = [call_function(loc) for loc in locations]
results = await asyncio.gather(*tasks)
for result in results:
print(f"Location: {result['location']}")
print(f"Function call: {result['function_call']}")
print(f"Content: {result['content']}\n")
if __name__ == "__main__":
asyncio.run(main()) output
Location: New York, NY
Function call: {'name': 'get_current_weather', 'arguments': '{"location": "New York, NY", "unit": "fahrenheit"}'}
Content:
Location: Los Angeles, CA
Function call: {'name': 'get_current_weather', 'arguments': '{"location": "Los Angeles, CA", "unit": "fahrenheit"}'}
Content:
Location: Chicago, IL
Function call: {'name': 'get_current_weather', 'arguments': '{"location": "Chicago, IL", "unit": "fahrenheit"}'}
Content:
Common variations
You can also use concurrent.futures.ThreadPoolExecutor for parallel calls in synchronous code. Change the model to other OpenAI models like gpt-4.1 or gpt-4o-mini. For streaming responses, use the stream=True parameter with async iteration.
import os
from concurrent.futures import ThreadPoolExecutor
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
functions = [
{
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location"]
}
}
]
def call_function(location: str):
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": f"What is the weather like in {location}?"}],
functions=functions,
function_call="auto"
)
message = response.choices[0].message
return {
"location": location,
"function_call": message.function_call if hasattr(message, "function_call") else None,
"content": message.content
}
locations = ["Miami, FL", "Seattle, WA", "Denver, CO"]
with ThreadPoolExecutor() as executor:
results = list(executor.map(call_function, locations))
for result in results:
print(f"Location: {result['location']}")
print(f"Function call: {result['function_call']}")
print(f"Content: {result['content']}\n") output
Location: Miami, FL
Function call: {'name': 'get_current_weather', 'arguments': '{"location": "Miami, FL", "unit": "fahrenheit"}'}
Content:
Location: Seattle, WA
Function call: {'name': 'get_current_weather', 'arguments': '{"location": "Seattle, WA", "unit": "fahrenheit"}'}
Content:
Location: Denver, CO
Function call: {'name': 'get_current_weather', 'arguments': '{"location": "Denver, CO", "unit": "fahrenheit"}'}
Content:
Troubleshooting
- If you get
RateLimitError, reduce concurrency or add retry logic with exponential backoff. - If
function_callis missing, verify yourfunctionsparameter is correctly formatted and the model supports function calling. - For authentication errors, ensure
OPENAI_API_KEYis set correctly in your environment.
Key Takeaways
- Use Python's asyncio or ThreadPoolExecutor to call OpenAI chat completions with function calling in parallel.
- Specify
functionsandfunction_callparameters in each request to enable structured function calls. - Handle rate limits by controlling concurrency and implementing retries.
- Parallel calls improve throughput and reduce latency when invoking multiple function calls.
- Always use environment variables for API keys and the latest OpenAI SDK v1+ syntax.