API Advanced medium · 6 min

Sonnet for generation tasks

What you will learn
Use Claude Sonnet 4.6 for cost-optimized content generation, code synthesis, and structured output tasks where Opus latency is unacceptable.

Why this matters

Sonnet balances cost and speed for production workloads: choosing the right model prevents overspending on Opus for tasks that don't need its reasoning depth, while avoiding quality loss from using a smaller model.

Skip if: Use Opus 4.6 for complex reasoning, multi-step planning, or scientific analysis. Use Haiku 3.5 for simple classification, formatting, or retrieval-augmented generation where latency matters more than nuance.

Explanation

What it does: Claude Sonnet 4.6 is Anthropic's mid-tier model: faster than Opus with 2x cost efficiency: designed for production generation tasks like email drafting, code generation, summarization, and data transformation. How it works: Sonnet uses the same transformer architecture as Opus but with reduced model capacity and optimized inference kernels, trading a small amount of reasoning capability for 3-5x lower latency and 50% cost reduction per token. The model maintains strong instruction-following and structured output capability through alignment training. When to use it: Choose Sonnet when you need fast, cost-effective generation at scale: batch processing documents, generating API responses, or creating variants of content. Use it when your task requires good writing quality and instruction adherence but not novel problem-solving.

Request code

python
from anthropic import Anthropic

client = Anthropic()

message = client.messages.create(
    model='claude-sonnet-4-6',
    max_tokens=1024,
    messages=[
        {
            'role': 'user',
            'content': 'Write a professional email requesting a meeting with a client named Alex about Q2 budget planning. Keep it to 3 sentences.'
        }
    ]
)

print('Generated email:')
print(message.content[0].text)
print(f'\nTokens used - Input: {message.usage.input_tokens}, Output: {message.usage.output_tokens}')

Authentication

Set your Anthropic API key as an environment variable before running code: export ANTHROPIC_API_KEY='sk-ant-...'. The SDK reads this automatically on client instantiation.

Response shape

FieldDescription
id msg_xxxxx (unique message identifier)
type message (always 'message')
role assistant
content [{"type": "text", "text": "Generated response here"}]
model claude-sonnet-4-6
stop_reason end_turn (normal completion) or max_tokens (hit limit)
stop_sequence null or custom stop sequence if provided
usage [object Object]

Field guide

stop_reason

If 'max_tokens', your response was truncated: you need more tokens. If 'end_turn', generation completed naturally.

usage.cache_creation_input_tokens

Tokens written to prompt cache on this request: only non-zero if you enabled prompt caching, saves money on repetitive requests.

usage.cache_read_input_tokens

Tokens read from cache on this request: if non-zero, those tokens cost 90% less than regular input tokens.

Setup trap

The SDK reads ANTHROPIC_API_KEY at client instantiation time. If you set the environment variable after creating the Anthropic() client, the key won't be picked up: initialize the client after your environment is fully configured.

Cost

Sonnet costs $3/MTok input, $15/MTok output. For a 1K-token generation task at scale, expect ~$0.003-0.015 per request. Opus costs 3x more ($15/$75). Over 10K daily requests, switching from Opus to Sonnet saves ~$360/day.

Rate limits

Standard tier: 10K requests/min, 1M tokens/min. Sonnet's 3-5x lower latency means you'll hit request rate limits (not token limits) first. Implement exponential backoff on 429 responses; upgrade to higher tier if consistent batching exceeds 10K req/min.

Common gotcha

Developers often set max_tokens=2048 assuming Sonnet can handle it, but for generation tasks at scale, you'll hit rate limits (10K requests/min on standard tier). The real gotcha: not checking stop_reason == 'max_tokens' in production: this silently truncates responses in batch jobs without error.

Error recovery

RateLimitError
HTTP 429 with 'rate_limit_exceeded' message. Implement exponential backoff with jitter: wait 1s, then 2s, 4s, etc. Do not retry immediately.
InvalidRequestError
Check that max_tokens doesn't exceed 4096 for Sonnet. Verify model string is exactly 'claude-sonnet-4-6' (not 'sonnet' or 'claude-sonnet').
APIConnectionError
Transient network failure. Retry with exponential backoff up to 3 times. If persistent, check client.api_key is set and endpoint is reachable.
AuthenticationError
API key invalid or expired. Verify ANTHROPIC_API_KEY env var is set and starts with 'sk-ant-'. Regenerate key in Anthropic console if needed.

Experienced dev note

Cache your system prompts and few-shot examples in Sonnet calls. Prompt caching (via request headers) stores up to 4x 1M-token blocks per model, and cached tokens cost 90% less. For repetitive generation (email templates, code scaffolding), a single cached few-shot example pays for itself in 15-20 requests. Also: Sonnet's speed advantage over Opus appears in latency metrics (200-400ms vs 800ms-2s), not token throughput: use for user-facing APIs, not batch processing where you'd parallelize anyway.

Check your understanding

You're generating 5,000 customer support responses per day. Your current Opus-based system costs $450/day and takes 1.2 seconds per response. You switch to Sonnet at $0.15/response with 350ms latency. How much do you save daily, and why would the latency improvement matter more than the cost savings?

Show answer hint

Cost savings: ~$300/day ($450 - $150). Latency matters because 350ms allows real-time API responses to users (under 500ms perceived latency threshold); 1.2s forces you to queue requests or show loading spinners, degrading UX even though both are 'production-ready'.

VERSION Claude 3.5 Sonnet (claude-sonnet-4-6) is the current production model as of April 2026. Older model IDs like 'claude-3-sonnet-20240229' are deprecated and will return ModelNotFoundError. Always use the latest 'claude-sonnet-4-6' in new code.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.