Polling and retrieving results
Why this matters
Long-running requests (batch processing, file analysis, or complex reasoning) tie up connections if you wait synchronously. Polling lets you submit work, do other things, and check back later: essential for production systems handling multiple concurrent tasks.
Explanation
The Anthropic API supports asynchronous request patterns through batch processing and file handling. When you submit certain types of requests: like processing large documents or running analyses: the API returns immediately with a request ID rather than waiting for completion. Polling is the pattern of periodically checking the status of that request until it completes.
Under the hood, your request gets queued on Anthropic's servers. Each poll is a separate HTTP GET request that checks if the work is done. The API tracks status (submitted, processing, completed, failed) and returns the full result only when ready. This is different from webhooks (push notification when done) and requires you to manage the polling loop yourself: deciding frequency, backoff strategy, and timeout handling.
Use polling when: (1) You're processing files or batches that take minutes to hours, (2) You need to handle multiple requests concurrently without threads, or (3) You want to implement custom retry/timeout logic. Don't poll constantly: use exponential backoff or implement proper wait strategies to avoid rate limits.
Request code
from anthropic import Anthropic
import time
import os
client = Anthropic(api_key=os.environ.get('ANTHROPIC_API_KEY'))
with open('document.txt', 'r') as f:
document_content = f.read()
response = client.messages.create(
model='claude-opus-4-6',
max_tokens=1024,
messages=[
{
'role': 'user',
'content': f'Analyze this document and extract key themes:\n\n{document_content}'
}
]
)
request_id = response.id
print(f'Request submitted with ID: {request_id}')
max_wait = 300
start_time = time.time()
poll_interval = 2
while time.time() - start_time < max_wait:
status_response = client.messages.retrieve(request_id)
if status_response.status == 'completed':
print(f'Request completed.')
print(f'Result: {status_response.content[0].text}')
break
elif status_response.status == 'failed':
print(f'Request failed: {status_response.error}')
break
else:
print(f'Status: {status_response.status}, retrying in {poll_interval}s...')
time.sleep(poll_interval)
poll_interval = min(poll_interval * 1.5, 30)
else:
print(f'Request timed out after {max_wait} seconds') Authentication
Set your Anthropic API key before instantiating the client: export ANTHROPIC_API_KEY='sk-ant-...' Or pass it directly: from anthropic import Anthropic client = Anthropic(api_key='sk-ant-...') The SDK reads ANTHROPIC_API_KEY from environment variables at instantiation time.
Response shape
| Field | Description |
|---|---|
id | string: unique identifier for this request |
status | string: one of 'submitted', 'processing', 'completed', or 'failed' |
content | list: if completed, contains the message content; empty until done |
error | object: present only if status is 'failed'; contains 'type' and 'message' |
created_at | string (ISO 8601): timestamp when request was submitted |
Field guide
status The only field you need to check. Poll until it's 'completed' or 'failed'. Intermediate statuses are 'submitted' (queued) and 'processing' (running).
content The actual result. Only populated after status=='completed'. Contains the same structure as a synchronous response: check content[0].text for the assistant's reply.
error Often overlooked: if status is 'failed', this object tells you why. Don't just assume failure means 'try again'; some errors (invalid input) won't resolve with retries.
Setup trap
The client.messages.retrieve(request_id) method only exists if your request was submitted with asynchronous semantics. If you call it on a synchronous message ID, you'll get an error. Make sure you're using the async-friendly endpoints or batch submission: not all request types support polling.
Cost
Each polling call is a separate API request and counts toward your usage limits and costs. If you poll every second for 5 minutes, that's 300 requests. Use exponential backoff and set reasonable timeouts to minimize wasted calls. A request that costs 100 tokens and you poll 1,000 times costs as much as 100,000 tokens in polling overhead alone.
Rate limits
Rapid polling (more than once per second) will trigger rate limits (429 Too Many Requests). If you're polling many requests concurrently, stagger the polls and use exponential backoff. Implement jitter (random delay) to avoid thundering herd problems if multiple workers poll simultaneously.
Common gotcha
Polling too frequently will hit rate limits. Many developers write a tight loop while True: check_status() without backoff, then wonder why they get 429 errors. Always implement exponential backoff starting at 1–2 seconds and capping at 30 seconds between polls.
Error recovery
APIError with 429APIError with 404APIConnectionErrorAPIError with 'invalid_request_id'Experienced dev note
Don't implement polling yourself for high-scale systems. Use a background job queue (Celery, AWS SQS, Google Cloud Tasks) with a worker that polls and stores results in a database. Polling in a web request handler blocks threads and wastes resources. Also: always set a timeout and log request IDs: you'll need them to debug stuck jobs in production.
Check your understanding
You submit a request at 2:00 PM and get back request_id='abc123'. Your polling loop checks status every 2 seconds and sees 'processing' at 2:00:30 PM, then 2:00:32 PM. At 2:05 PM, status is still 'processing'. Should you keep polling indefinitely, and what's the actual risk if you don't set a timeout?
Show answer hint
Set a hard timeout (not infinite polling). The real risk isn't that the request hangs forever: it's that something silently failed without returning 'failed' status, and you're wasting quota and compute polling a ghost. In production, log the request ID and alert after 5–10 minutes of 'processing' status.