client.messages.create(): the core method
Why this matters
Every interaction with Claude goes through this single method. Understanding its parameters, response structure, and error modes is the foundation for building any Claude application: from simple chatbots to complex agentic systems.
Explanation
What it does: client.messages.create() sends a list of messages to Claude and returns a single Message object containing the model's response. It's a synchronous, blocking call that waits for the entire response before returning.
How it works: You provide a model (e.g., claude-opus-4-6), a max_tokens limit, and a messages list. The SDK serializes these to JSON, authenticates using your API key from the environment, sends an HTTPS POST to Anthropic's endpoint, and deserializes the response back into a Python object. The response includes the assistant's text, token usage metadata, and a stop reason (e.g., end_turn, max_tokens).
When to use it: Use this for any request where you need the full response before proceeding: question answering, summarization, code generation, or decision-making steps in a workflow. It's the simplest and most predictable way to call Claude.
Request code
import os
from anthropic import Anthropic
api_key = os.environ.get('ANTHROPIC_API_KEY')
if not api_key:
raise ValueError('ANTHROPIC_API_KEY environment variable not set')
client = Anthropic(api_key=api_key)
message = client.messages.create(
model='claude-opus-4-6',
max_tokens=1024,
messages=[
{
'role': 'user',
'content': 'What is the capital of France?'
}
]
)
print(message.content[0].text) Authentication
Set the environment variable ANTHROPIC_API_KEY before instantiating the client. The SDK reads this at client creation time, not at request time. Export ANTHROPIC_API_KEY='sk-ant-...' in your shell or set it in a .env file and load it with python-dotenv before importing Anthropic.
Response shape
| Field | Description |
|---|---|
id | Unique message identifier (string, e.g., 'msg_...') |
type | Always 'message' |
role | Always 'assistant' |
content | List of content blocks; for text, contains [{'type': 'text', 'text': 'response string'}] |
model | The model that generated the response (string) |
stop_reason | Why generation stopped: 'end_turn', 'max_tokens', or 'stop_sequence' |
stop_sequence | The sequence that triggered stop_reason, if applicable |
usage | Object with 'input_tokens' (int) and 'output_tokens' (int) |
created_at | ISO 8601 timestamp when the message was created (string) |
Field guide
content A list, not a string. Always index it with [0] to access the first (usually only) text block.
usage The field that tells you billing impact. input_tokens + output_tokens × 3 = approximate cost in USD cents for claude-opus-4-6 (varies by model).
stop_reason Critical for workflow logic. If it's 'max_tokens', you hit your limit and the response is incomplete: you likely need to increase max_tokens or chunk input.
created_at Developers often ignore this, but it's your proof of request timing for debugging rate-limit issues and correlating logs.
Setup trap
Setting ANTHROPIC_API_KEY in Python with os.environ['ANTHROPIC_API_KEY'] = '...' after instantiating Anthropic() will not work. The client reads the key at __init__ time. Always set the environment variable before importing or explicitly pass api_key=os.environ.get('ANTHROPIC_API_KEY') to the Anthropic constructor.
Cost
claude-opus-4-6 costs approximately $0.003 per 1K input tokens and $0.015 per 1K output tokens (April 2026 pricing). A 1,000-token input + 500-token output request costs ~$0.0105. Enable caching on system prompts longer than 1,024 tokens to reduce input costs by 90%.
Rate limits
Standard Anthropic accounts are rate-limited to 10 requests per second by default. If you exceed this, you'll receive a 429 status code. Implement exponential backoff with jitter (wait 1s, 2s, 4s, etc.) rather than retrying immediately. Batch API has no rate limits but requires asynchronous workflows.
Common gotcha
Accessing response.text instead of response.content[0].text. The response object has no .text attribute. You must access the content list and index into it, then access the .text property of that content block.
Error recovery
AuthenticationErrorRateLimitErrorInvalidRequestErrorAPIConnectionErrorExperienced dev note
Always inspect stop_reason in production. If it's 'max_tokens', your response is truncated and the model was cut off mid-sentence. For summarization or structured output, set max_tokens 30% higher than you think you need, then trim the response afterward. For real-time user-facing apps, switch to client.messages.stream() and yield tokens as they arrive: users will perceive 10x faster responses even if total latency is the same. Also: the messages list is immutable after the request; build it once and reuse for retries, don't modify it between attempts.
Check your understanding
Why does accessing response.text fail, and what is the correct way to extract Claude's response text from the response object?
Show answer hint
The response object structure is a Message with a content field that is a list of content blocks. You must index into that list and then access the .text property of the resulting ContentBlock.