How to intermediate · 3 min read

How to run assistant on thread in OpenAI

Quick answer
Use Python's threading module to run multiple OpenAI assistant calls concurrently by creating threads that each call client.chat.completions.create. This enables running assistants on separate threads for parallel processing with the OpenAI SDK.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup

Install the official OpenAI Python SDK and set your API key as an environment variable.

bash
pip install openai>=1.0

Step by step

This example demonstrates running two assistant calls concurrently on separate threads using Python's threading module and the OpenAI SDK v1.

python
import os
import threading
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Function to run assistant chat completion

def run_assistant_thread(thread_id, prompt):
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )
    print(f"Thread {thread_id} response:\n", response.choices[0].message.content)

# Define prompts for each thread
prompts = [
    "Hello from thread 1! How are you?",
    "Hello from thread 2! Tell me a joke."
]

threads = []

# Create and start threads
for i, prompt in enumerate(prompts, start=1):
    thread = threading.Thread(target=run_assistant_thread, args=(i, prompt))
    threads.append(thread)
    thread.start()

# Wait for all threads to finish
for thread in threads:
    thread.join()
output
Thread 1 response:
 I'm doing great, thanks for asking! How can I assist you today?
Thread 2 response:
 Why did the scarecrow win an award? Because he was outstanding in his field!

Common variations

  • Use concurrent.futures.ThreadPoolExecutor for thread pooling and easier management.
  • Run asynchronous calls with asyncio and OpenAI's async client if supported.
  • Switch models by changing the model parameter, e.g., gpt-4o-mini.
python
import os
import concurrent.futures
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

def run_assistant(prompt):
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

prompts = ["Hello from thread 1!", "Hello from thread 2!"]

with concurrent.futures.ThreadPoolExecutor() as executor:
    results = list(executor.map(run_assistant, prompts))

for i, res in enumerate(results, start=1):
    print(f"Thread {i} response:\n{res}")
output
Thread 1 response:
 Hello from thread 1!
Thread 2 response:
 Hello from thread 2!

Troubleshooting

  • If you get RateLimitError, reduce concurrency or add retry logic.
  • Ensure your OPENAI_API_KEY is set correctly in your environment.
  • Check for thread safety issues; the OpenAI client is thread-safe for read operations.

Key Takeaways

  • Use Python's threading module to run multiple OpenAI assistant calls concurrently.
  • Always get your API key from environment variables for security.
  • Consider ThreadPoolExecutor for easier thread management and scaling.
Verified 2026-04 · gpt-4o, gpt-4o-mini
Verify ↗