Submitting a batch
Why this matters
Batches let you process large volumes of requests cost-effectively when you don't need real-time responses: ideal for data processing pipelines, content generation at scale, or analyzing thousands of documents overnight.
Explanation
What it does: The Anthropic Batches API lets you submit a JSONL file containing multiple message requests, which Claude processes asynchronously in a queue. You get a 50% discount on input tokens compared to synchronous API calls, plus billing for output tokens and a small per-request fee.
How it works: You format requests as JSONL (one JSON object per line), upload the file via client.beta.files.upload(), then create a batch job with client.beta.messages.batches.create(). The API returns a batch ID immediately. You poll client.beta.messages.batches.retrieve(batch_id) to check status, and when processing completes, retrieve results via client.beta.messages.batches.results(batch_id).
When to use it: Batch processing works best for workflows where latency tolerance is minutes to hours: bulk content moderation, summarizing document libraries, generating product descriptions from templates, or running weekly analysis jobs. Each batch can contain up to 10,000 requests and typically processes within a few hours depending on queue depth.
Request code
import anthropic
import json
import time
client = anthropic.Anthropic()
requests = [
{
"custom_id": "request-1",
"params": {
"model": "claude-opus-4-6",
"max_tokens": 256,
"messages": [
{"role": "user", "content": "Summarize this in one sentence: Machine learning is a subset of artificial intelligence focused on learning patterns from data."}
]
}
},
{
"custom_id": "request-2",
"params": {
"model": "claude-opus-4-6",
"max_tokens": 256,
"messages": [
{"role": "user", "content": "Summarize this in one sentence: Deep learning uses neural networks with multiple layers to extract increasingly abstract representations."}
]
}
}
]
requests_jsonl = "\n".join(json.dumps(req) for req in requests)
with open("/tmp/batch_requests.jsonl", "w") as f:
f.write(requests_jsonl)
with open("/tmp/batch_requests.jsonl", "rb") as f:
file_response = client.beta.files.upload(
file=("batch_requests.jsonl", f, "application/jsonl")
)
file_id = file_response.id
print(f"Uploaded file: {file_id}")
batch_response = client.beta.messages.batches.create(
model="claude-opus-4-6",
input_file_id=file_id
)
batch_id = batch_response.id
print(f"Created batch: {batch_id}")
print(f"Status: {batch_response.processing_status}")
while True:
batch_status = client.beta.messages.batches.retrieve(batch_id)
print(f"Batch status: {batch_status.processing_status}")
if batch_status.processing_status == "ended":
break
time.sleep(5)
results = list(client.beta.messages.batches.results(batch_id))
print(f"\nResults ({len(results)} requests):")
for result in results:
print(f"Request {result.custom_id}: {result.result.message.content[0].text}") Authentication
Batches require a standard Anthropic API key set as the ANTHROPIC_API_KEY environment variable. The beta Batches API endpoints are automatically available in anthropic 0.94.x and later. No additional authentication setup is needed beyond standard API key configuration.
Response shape
| Field | Description |
|---|---|
id | string: unique batch identifier |
type | string: always 'batch' |
processing_status | string: 'queued', 'processing', or 'ended' |
request_counts | [object Object] |
output_file_id | string: file ID containing results (present when processing_status == 'ended') |
error_file_id | string: file ID containing error details (null if no errors) |
created_at | string: ISO 8601 timestamp when batch was created |
expires_at | string: ISO 8601 timestamp when batch results expire |
Field guide
processing_status Start polling when status is 'queued' or 'processing'. Stop when 'ended'. This is the field that tells you when to call results().
request_counts Audit this before calling results(). If errored > 0, retrieve error_file_id and parse failures separately.
output_file_id Only populated when processing_status == 'ended'. Use this ID with client.beta.files.download(output_file_id) if you need the raw file.
expires_at Results are retained for 29 days. Schedule cleanup jobs before this timestamp or results become permanently unavailable.
Setup trap
The Batches API requires the .beta namespace in the client. If you call client.messages.batches instead of client.beta.messages.batches, you'll get an AttributeError. The beta API endpoints were added in anthropic 0.90.0+, so older pinned versions will fail silently.
Cost
Batches cost 50% of standard synchronous API rates for input tokens. Output tokens cost the same. There is an additional $0.01 per 1,000 requests fee. Example: 1,000 requests averaging 500 input tokens and 200 output tokens would cost approximately (1000 × 500 × 0.5 × per-token-rate) + (1000 × 200 × per-token-rate) + $0.01. For claude-opus-4-6 at April 2026 pricing, that's roughly 50-60% cheaper than synchronous API calls for the same workload.
Rate limits
Batches themselves don't have per-batch rate limits, but your account has a total batch token throughput limit (typically 2M tokens/minute). If you submit batches too frequently, you'll hit your account's overall rate limit. Stagger batch submissions or request higher limits from Anthropic support.
Common gotcha
Most developers poll for batch completion once, see 'processing', then move on without checking status again. Batches can take 5 minutes to hours: implement exponential backoff polling, not fixed 5-second intervals. Also, the results() method returns an iterator of individual result objects, not a single response. You must iterate through it or convert to a list to access all results.
Error recovery
APIConnectionErrorAuthenticationErrorRateLimitErrorBadRequestErrorExperienced dev note
Store batch IDs and submission timestamps in a durable queue (database, message broker, or file system) before polling. If your process crashes mid-poll, you can recover the batch status later. Also: batch processing latency is not linear. A 10,000-request batch takes roughly the same time as a 1,000-request batch because both sit in the same processing queue: you're not paying a cost premium for scale beyond token pricing. This makes batches ideal for off-peak processing of large workloads.
Check your understanding
You've submitted a batch with 5,000 requests at 10 AM. At 10:30 AM, polling shows status='processing' with 2,000 succeeded, 2,500 processing, 500 errored. Should you retrieve results now? What would you do about the 500 failed requests?
Show answer hint
Results are only available when processing_status == 'ended'. Calling results() on a batch still in 'processing' will fail. For errored requests, retrieve error_file_id and parse the JSONL to identify which custom_ids failed and why, then decide whether to retry those individually or via a new batch.