API Advanced hard · 8 min

Run status polling: polling vs webhooks

What you will learn

Poll Assistants API run status in a loop, knowing when and why webhooks are better for production.

Why this matters

Assistants runs are asynchronous: you must wait for completion. Naive polling wastes API quota and money; webhooks eliminate wasted requests but require infrastructure. Senior engineers choose wrong here and either drain budgets or miss failures.

Skip if: Use polling when: testing locally, one-shot scripts, or runs complete in <2 seconds. Use webhooks when: production workloads, scale >100 concurrent runs, or cost per request matters. Never poll in a tight loop without backoff: you'll hit rate limits in seconds.

Explanation

What it does: After you create an Assistants run with client.beta.threads.runs.create(), the run executes asynchronously. You call client.beta.threads.runs.retrieve(thread_id, run_id) repeatedly to check status until it reaches completed, failed, or expired. How it works: Each retrieve() call is a full API request: you pay per call. If you poll every 100ms, that's 10 calls/second, costing thousands monthly on a production workload. Webhooks flip the model: OpenAI pushes status updates to your endpoint, you respond instantly, zero wasted requests. The tradeoff is you need a public HTTPS endpoint, retry logic, and idempotency. When to use polling: Prototyping, tests, or guaranteed short runs. Polling with exponential backoff (start 500ms, cap at 5s) works for human-scale workflows. Webhooks are mandatory for production at scale.

Request code

python

import time
from openai import OpenAI

client = OpenAI()

# Create a thread
thread = client.beta.threads.create()
thread_id = thread.id

# Add a message
client.beta.threads.messages.create(
    thread_id=thread_id,
    role="user",
    content="What is 2+2?"
)

# Create a run (async)
run = client.beta.threads.runs.create(
    thread_id=thread_id,
    assistant_id="asst_abc123xyz"
)
run_id = run.id

# Polling with exponential backoff
wait_time = 0.5
max_wait_time = 5.0
max_total_wait = 60.0
start_time = time.time()

while True:
    elapsed = time.time() - start_time
    if elapsed > max_total_wait:
        print(f"Timeout: run did not complete in {max_total_wait}s")
        break
    
    run = client.beta.threads.runs.retrieve(thread_id=thread_id, run_id=run_id)
    
    if run.status == "completed":
        print(f"Run completed. Status: {run.status}")
        messages = client.beta.threads.messages.list(thread_id=thread_id)
        print(f"Response: {messages.data[0].content[0].text}")
        break
    elif run.status in ["failed", "expired", "cancelled"]:
        print(f"Run failed with status: {run.status}")
        if run.last_error:
            print(f"Error: {run.last_error.message}")
        break
    else:
        print(f"Status: {run.status}. Waiting {wait_time}s...")
        time.sleep(wait_time)
        wait_time = min(wait_time * 1.5, max_wait_time)

Authentication

Your OpenAI API key must be set before instantiation. Export it: export OPENAI_API_KEY='sk-...' or pass directly: client = OpenAI(api_key='sk-...'). The SDK reads the environment variable at instantiation time, so set it before creating the client.

Response shape

Field	Description
`id`	run_abc123xyz
`object`	thread.run
`thread_id`	thread_xyz789
`assistant_id`	asst_def456
`status`	completed\|queued\|in_progress\|requires_action\|failed\|expired\|cancelled
`created_at`	1704067200
`started_at`	1704067201
`completed_at`	1704067210
`expires_at`	1704153600
`last_error`
`model`	gpt-4-turbo
`instructions`	You are a helpful assistant.
`tools`
`metadata`	[object Object]

Field guide

status

Current state of the run. <code>queued</code> = waiting to start, <code>in_progress</code> = executing, <code>requires_action</code> = waiting for tool use, <code>completed</code> = done, <code>failed</code> = error occurred.

last_error

Populated only if status is <code>failed</code>. Contains <code>code</code> and <code>message</code> explaining why the run failed. Check this before retrying.

started_at

Unix timestamp when execution began. Null if not yet started. Use to calculate how long the run has been running.

expires_at

Unix timestamp when the run expires and becomes unusable. Runs auto-expire after 10 minutes of inactivity. If your poll takes longer, you must restart the run.

required_action

Hidden field in <code>requires_action</code> status. Contains <code>tool_calls</code> array: you must process these, then submit results with <code>client.beta.threads.runs.submit_tool_outputs()</code> to resume.

Setup trap

Setting OPENAI_API_KEY in your environment after creating the OpenAI() client has no effect. The SDK reads the key at instantiation time. If your code sets the env var inside a function but creates the client at module load time, authentication fails silently with a cryptic error. Always set OPENAI_API_KEY before any imports or pass api_key directly to OpenAI(api_key=...).

Cost

Polling is deceptively expensive. At 1 request/second for 10 seconds per run, that's 10 API calls. At 1M runs/month, that's 10M calls (~$1.50 at $0.00015/call for gpt-4-turbo). Webhooks eliminate wasted calls entirely. A single webhook endpoint costs $0 in OpenAI API fees and ~$20–50/month in cloud infrastructure. The break-even point is roughly 50 concurrent runs.

Rate limits

Polling without backoff triggers rate limits immediately. Each <code>retrieve()</code> counts against your RPM (requests-per-minute) quota. With multiple concurrent runs, naive polling can exhaust your quota in seconds. Implement exponential backoff and batch status checks if polling >50 runs. Better: use webhooks and skip polling entirely.

Common gotcha

Polling without backoff or max retries. Developers write while True: run = client.beta.threads.runs.retrieve(...) in a tight loop. With 100 concurrent runs, this hits rate limits within seconds. Add exponential backoff (start 0.5s, cap 5s) and timeout (60s max). Also: checking run.status immediately after create() will always be queued: the run hasn't started yet.

Error recovery

AuthenticationError

Invalid API key or expired key. Verify <code>OPENAI_API_KEY</code> is set correctly before <code>OpenAI()</code> instantiation. No workaround: fix the key.

NotFoundError

Run ID or thread ID doesn't exist. This happens if you try to poll a run that expired (>10 min inactive) or was deleted. Store run IDs immediately after creation; use webhooks to avoid polling old runs.

RateLimitError

Polling too aggressively. Implement exponential backoff: wait 0.5s, then 0.75s, then 1.1s, capped at 5s. Or switch to webhooks.

APIError with 500

OpenAI server error. Retry with exponential backoff (2s, 4s, 8s, cap 60s). These are temporary; polling will eventually succeed.

Experienced dev note

The real cost of polling is hidden. Many teams poll every 500ms to feel responsive, then are shocked by $10k/month API bills. Here's the secret: set a polling interval of 2–3 seconds minimum. Runs rarely complete faster. You save 80% of calls for negligible latency cost. For production, webhooks pay for themselves after 1 month. Set up a simple Lambda + SQS or FastAPI endpoint to receive webhook events. You'll also get better error observability: webhooks include error details; polling forces you to parse last_error separately. One more thing: requires_action status is a foot-gun. If your code doesn't handle tool use, the run hangs forever. Always check for it before polling again.

Check your understanding

Your assistant handles a tool call (function invocation) during a run, and you've submitted the tool outputs. You then immediately poll for status. Why might the status still be requires_action, and what should you do?

Show answer hint

Tool output submission is asynchronous. The server needs time to process your submission and move the run forward. You must poll again after a small delay, not assume status changes instantly. Always treat <code>requires_action</code> as a loop: retrieve → parse tools → submit outputs → wait → retrieve again.

VERSION This pattern applies to openai 1.0+. The beta Assistants API is stable but beta-tagged: OpenAI may add breaking changes. Always pin your openai version in requirements.txt. As of April 2026, client.beta.threads.runs is the current API; no replacements are announced, but check release notes before major upgrades.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.