claude-3-5-sonnet: best for most tasks
Why this matters
Choosing the right model directly affects your application's latency, cost-per-request, and output quality. Sonnet is the sweet spot for developers who don't need the full capability of Opus but want better reasoning than Haiku.
Explanation
Claude 3.5 Sonnet is Anthropic's mid-tier model released in June 2024, optimized as the default choice for production systems. It processes text faster than Opus while maintaining strong reasoning capabilities at a lower cost per token. Under the hood, Sonnet uses the same transformer architecture as other Claude models but with training optimizations that reduce latency by ~25% compared to Opus while maintaining 95% of reasoning quality for most tasks. The model excels at code generation, content analysis, customer support automation, and general-purpose AI tasks. Use Sonnet when you need predictable performance without overthinking model selection: it's the model Anthropic optimizes for first when adding new features.
Request code
import os
from anthropic import Anthropic
client = Anthropic(api_key=os.environ.get('ANTHROPIC_API_KEY'))
message = client.messages.create(
model='claude-3-5-sonnet-20241022',
max_tokens=1024,
messages=[
{
'role': 'user',
'content': 'Explain how quantum entanglement works in three sentences.'
}
]
)
print(f'Model: {message.model}')
print(f'Stop reason: {message.stop_reason}')
print(f'Output: {message.content[0].text}')
print(f'Input tokens: {message.usage.input_tokens}')
print(f'Output tokens: {message.usage.output_tokens}') Authentication
Set your API key as an environment variable before running code: export ANTHROPIC_API_KEY='sk-ant-...' (macOS/Linux) or set ANTHROPIC_API_KEY=sk-ant-... (Windows). The Anthropic SDK reads this automatically when you instantiate the client. No explicit authentication calls are needed.
Response shape
| Field | Description |
|---|---|
id | msg_abc123: unique message identifier |
type | message: always 'message' for this endpoint |
role | assistant: indicates the response is from Claude |
content | array of content blocks, typically [{'type': 'text', 'text': 'response...'}] |
model | claude-3-5-sonnet-20241022: exact model used (may differ from requested if deprecated) |
stop_reason | end_turn or max_tokens: why generation stopped |
stop_sequence | null or the sequence that triggered stop (if set in request) |
usage.input_tokens | number of tokens in your messages |
usage.output_tokens | number of tokens in Claude's response |
Field guide
stop_reason Always check this: 'max_tokens' means Claude was cut off mid-thought; 'end_turn' means a natural stop. Never trust incomplete responses.
model The actual model served may differ from your request if yours is deprecated. Always log this to catch model version drift in production.
content[0].text The actual text response. Use content[0] because content is always an array (important for vision or multi-modal future requests).
usage.output_tokens Critical for cost tracking. Charge customers based on output tokens, not input: output tokens often exceed input tokens in long conversations.
Setup trap
The Anthropic SDK reads ANTHROPIC_API_KEY from os.environ at client instantiation time. If you set os.environ['ANTHROPIC_API_KEY'] after creating the client, it won't be picked up: reorder your code to set the env var before Anthropic() is called.
Cost
Sonnet costs $3 per 1M input tokens and $15 per 1M output tokens (April 2026 pricing). A typical 1000-token request costs ~$0.018. Budget 10-15x more for output tokens than input in your cost models because Claude often generates longer responses than the input prompt.
Rate limits
Standard tier allows 40,000 RPM (requests per minute) and 2M TPM (tokens per minute). Most developers hit TPM limits before RPM. If rate-limited, implement exponential backoff with jitter: wait 1s, 2s, 4s before retrying.
Common gotcha
Passing model='claude-3-5-sonnet' without the exact version suffix (like -20241022) will route to the latest version, which may change Anthropic's behavior. Always pin the full model ID in production code to prevent silent breaking changes.
Error recovery
AuthenticationErrorRateLimitErrorInvalidRequestError with 'max_tokens'APIConnectionErrorAPIStatusError with 400Experienced dev note
Sonnet is the model Anthropic optimizes for in production. Feature rollouts land here first, and Anthropic's own reliability metrics are highest for Sonnet. This is not a second-choice model: it's the strategic choice. Also: log model version in all production responses. Silent model rollouts have caught teams off guard; detecting them requires comparing response.model against your request.
Check your understanding
Why would you get different outputs from two identical requests to Sonnet on the same day, and what should you check first?
Show answer hint
Anthropic silently rolls out newer model versions when old versions reach deprecation. Check response.model against your requested model: if they differ, you've been routed to a newer version. This is also why pinning the full model ID matters.