Multi-turn conversation: the message array
Why this matters
Real applications require multi-turn conversations: users ask follow-ups, clarify previous responses, and build on context. The message array is how you preserve that context without retraining or manually concatenating strings.
Explanation
What it does: The messages parameter in Claude's API is a list of conversation turns. Each turn is a dict with role ('user' or 'assistant') and content (the text). Claude reads the entire history to understand context, then appends its response.
How it works: When you call client.messages.create(), the API doesn't just see your latest question: it sees the full conversation thread. Claude's transformer architecture processes all prior messages to build context before generating the next response. This is why order matters: user messages come before assistant responses in the order they occurred.
When to use it: Always use a message array for any conversational interface. Start with an empty array for the first turn, append the user's message, call the API, append Claude's response to the array, then repeat. This keeps code clean and prevents context loss.
Request code
from anthropic import Anthropic
client = Anthropic()
conversation_history = []
conversation_history.append({
"role": "user",
"content": "What is photosynthesis in one sentence?"
})
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=256,
messages=conversation_history
)
assistant_message = response.content[0].text
conversation_history.append({
"role": "assistant",
"content": assistant_message
})
print(f"Assistant: {assistant_message}")
conversation_history.append({
"role": "user",
"content": "Can you explain the electron transport chain part more?"
})
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=256,
messages=conversation_history
)
assistant_message = response.content[0].text
conversation_history.append({
"role": "assistant",
"content": assistant_message
})
print(f"Assistant: {assistant_message}") Authentication
Set your Anthropic API key as an environment variable before running code: export ANTHROPIC_API_KEY='sk-ant-...'. The Python SDK reads this automatically when you instantiate Anthropic().
Response shape
| Field | Description |
|---|---|
id | msg_xxxxx: unique identifier for this response |
type | "message": always this value |
role | "assistant": the role of the responder |
content | [object Object] |
model | "claude-opus-4-6": which model version generated this |
stop_reason | "end_turn" or "max_tokens": why the response ended |
stop_sequence | null: custom stop sequence if you provided one |
usage | [object Object] |
Field guide
content Always an array. For text responses, access the text via response.content[0].text: do not assume it's a string directly.
usage Shows token consumption for this single turn. Track this to estimate monthly API cost and avoid surprises.
stop_reason If "max_tokens", Claude was cut off mid-sentence. Increase max_tokens or the response is incomplete. If "end_turn", Claude finished naturally.
id A hidden field that matters: use this in logging or error tracking to correlate issues with Anthropic support. Essential in production.
Setup trap
The API key must be set before instantiating the Anthropic client. If you call Anthropic() and the env var isn't set, you'll get an authentication error immediately. Many tutorials skip this step and assume the reader knows to export the key.
Cost
Each call charges for all tokens in the message history plus the response. A 10-turn conversation where each turn is 100 tokens costs roughly 1000 tokens input (all prior turns) + response tokens. This compounds. For long conversations (50+ turns), consider summarizing old messages or using a different architecture.
Rate limits
Anthropic's free tier allows ~5 API calls per minute. Multi-turn conversations that make one call per user input hit this quickly in testing. Production accounts have higher limits, but test with batches or delays if you exceed rate limits.
Common gotcha
Developers often forget to append Claude's response back to the message array before the next turn. Without this, each new question loses all previous context because the API only sees the new message. The array must grow with every exchange.
Error recovery
APIConnectionErrorAuthenticationErrorRateLimitErrorAPIStatusErrorExperienced dev note
In production, never keep the full message array in memory across sessions. Store it in a database (PostgreSQL JSONB, DynamoDB, Firestore) keyed by user/conversation ID. On each new user message, reconstruct the array from the database, append the new message, call the API, then write the response back. This scales and survives restarts. Also: monitor token usage per user to catch abuse or loops early: a user accidentally creating a feedback loop can drain credits fast.
Check your understanding
If a user sends 5 messages in a conversation and Claude responds to each, how many messages should your array contain before you make the 6th API call? Why does the number matter?
Show answer hint
The array should contain 10 messages (5 user + 5 assistant responses). The number matters because the API charges for every token in the messages array, not just the new turn. Older conversations become more expensive.