Run status polling: polling vs webhooks
Why this matters
Assistants runs are asynchronous: you must wait for completion. Naive polling wastes API quota and money; webhooks eliminate wasted requests but require infrastructure. Senior engineers choose wrong here and either drain budgets or miss failures.
Explanation
What it does: After you create an Assistants run with client.beta.threads.runs.create(), the run executes asynchronously. You call client.beta.threads.runs.retrieve(thread_id, run_id) repeatedly to check status until it reaches completed, failed, or expired. How it works: Each retrieve() call is a full API request: you pay per call. If you poll every 100ms, that's 10 calls/second, costing thousands monthly on a production workload. Webhooks flip the model: OpenAI pushes status updates to your endpoint, you respond instantly, zero wasted requests. The tradeoff is you need a public HTTPS endpoint, retry logic, and idempotency. When to use polling: Prototyping, tests, or guaranteed short runs. Polling with exponential backoff (start 500ms, cap at 5s) works for human-scale workflows. Webhooks are mandatory for production at scale.
Request code
import time
from openai import OpenAI
client = OpenAI()
# Create a thread
thread = client.beta.threads.create()
thread_id = thread.id
# Add a message
client.beta.threads.messages.create(
thread_id=thread_id,
role="user",
content="What is 2+2?"
)
# Create a run (async)
run = client.beta.threads.runs.create(
thread_id=thread_id,
assistant_id="asst_abc123xyz"
)
run_id = run.id
# Polling with exponential backoff
wait_time = 0.5
max_wait_time = 5.0
max_total_wait = 60.0
start_time = time.time()
while True:
elapsed = time.time() - start_time
if elapsed > max_total_wait:
print(f"Timeout: run did not complete in {max_total_wait}s")
break
run = client.beta.threads.runs.retrieve(thread_id=thread_id, run_id=run_id)
if run.status == "completed":
print(f"Run completed. Status: {run.status}")
messages = client.beta.threads.messages.list(thread_id=thread_id)
print(f"Response: {messages.data[0].content[0].text}")
break
elif run.status in ["failed", "expired", "cancelled"]:
print(f"Run failed with status: {run.status}")
if run.last_error:
print(f"Error: {run.last_error.message}")
break
else:
print(f"Status: {run.status}. Waiting {wait_time}s...")
time.sleep(wait_time)
wait_time = min(wait_time * 1.5, max_wait_time) Authentication
Your OpenAI API key must be set before instantiation. Export it: export OPENAI_API_KEY='sk-...' or pass directly: client = OpenAI(api_key='sk-...'). The SDK reads the environment variable at instantiation time, so set it before creating the client.
Response shape
| Field | Description |
|---|---|
id | run_abc123xyz |
object | thread.run |
thread_id | thread_xyz789 |
assistant_id | asst_def456 |
status | completed|queued|in_progress|requires_action|failed|expired|cancelled |
created_at | 1704067200 |
started_at | 1704067201 |
completed_at | 1704067210 |
expires_at | 1704153600 |
last_error | |
model | gpt-4-turbo |
instructions | You are a helpful assistant. |
tools | |
metadata | [object Object] |
Field guide
status Current state of the run. <code>queued</code> = waiting to start, <code>in_progress</code> = executing, <code>requires_action</code> = waiting for tool use, <code>completed</code> = done, <code>failed</code> = error occurred.
last_error Populated only if status is <code>failed</code>. Contains <code>code</code> and <code>message</code> explaining why the run failed. Check this before retrying.
started_at Unix timestamp when execution began. Null if not yet started. Use to calculate how long the run has been running.
expires_at Unix timestamp when the run expires and becomes unusable. Runs auto-expire after 10 minutes of inactivity. If your poll takes longer, you must restart the run.
required_action Hidden field in <code>requires_action</code> status. Contains <code>tool_calls</code> array: you must process these, then submit results with <code>client.beta.threads.runs.submit_tool_outputs()</code> to resume.
Setup trap
Setting OPENAI_API_KEY in your environment after creating the OpenAI() client has no effect. The SDK reads the key at instantiation time. If your code sets the env var inside a function but creates the client at module load time, authentication fails silently with a cryptic error. Always set OPENAI_API_KEY before any imports or pass api_key directly to OpenAI(api_key=...).
Cost
Polling is deceptively expensive. At 1 request/second for 10 seconds per run, that's 10 API calls. At 1M runs/month, that's 10M calls (~$1.50 at $0.00015/call for gpt-4-turbo). Webhooks eliminate wasted calls entirely. A single webhook endpoint costs $0 in OpenAI API fees and ~$20–50/month in cloud infrastructure. The break-even point is roughly 50 concurrent runs.
Rate limits
Polling without backoff triggers rate limits immediately. Each <code>retrieve()</code> counts against your RPM (requests-per-minute) quota. With multiple concurrent runs, naive polling can exhaust your quota in seconds. Implement exponential backoff and batch status checks if polling >50 runs. Better: use webhooks and skip polling entirely.
Common gotcha
Polling without backoff or max retries. Developers write while True: run = client.beta.threads.runs.retrieve(...) in a tight loop. With 100 concurrent runs, this hits rate limits within seconds. Add exponential backoff (start 0.5s, cap 5s) and timeout (60s max). Also: checking run.status immediately after create() will always be queued: the run hasn't started yet.
Error recovery
AuthenticationErrorNotFoundErrorRateLimitErrorAPIError with 500Experienced dev note
The real cost of polling is hidden. Many teams poll every 500ms to feel responsive, then are shocked by $10k/month API bills. Here's the secret: set a polling interval of 2–3 seconds minimum. Runs rarely complete faster. You save 80% of calls for negligible latency cost. For production, webhooks pay for themselves after 1 month. Set up a simple Lambda + SQS or FastAPI endpoint to receive webhook events. You'll also get better error observability: webhooks include error details; polling forces you to parse last_error separately. One more thing: requires_action status is a foot-gun. If your code doesn't handle tool use, the run hangs forever. Always check for it before polling again.
Check your understanding
Your assistant handles a tool call (function invocation) during a run, and you've submitted the tool outputs. You then immediately poll for status. Why might the status still be requires_action, and what should you do?
Show answer hint
Tool output submission is asynchronous. The server needs time to process your submission and move the run forward. You must poll again after a small delay, not assume status changes instantly. Always treat <code>requires_action</code> as a loop: retrieve → parse tools → submit outputs → wait → retrieve again.
client.beta.threads.runs is the current API; no replacements are announced, but check release notes before major upgrades.