stop_reason: why generation ended
Why this matters
In production, you need to know if Claude actually completed the response or ran out of space. A truncated response treated as complete causes silent correctness bugs. <code>stop_reason</code> lets you detect incomplete responses and decide whether to re-prompt, truncate gracefully, or alert the user.
Explanation
When Claude generates a response, the API returns a stop_reason field that explains why generation stopped. The most common values are "end_turn" (Claude finished naturally), "max_tokens" (hit the token limit you set), and "stop_sequence" (generation hit a stop sequence you defined). Think of it as the 'reason code' for why the conversation ended.
Under the hood, Claude generates tokens one at a time. The API checks after each token whether a stop condition has been met: either the model decided to end (end_turn), you ran out of max_tokens, or a stop sequence appeared. The stop_reason tells your code which condition triggered first. This is different from HTTP status codes; the request succeeds either way, but the response might be incomplete.
Always check stop_reason in production. If it's "max_tokens", your response is definitely truncated and you should either increase max_tokens for a retry, show a 'response was incomplete' message to the user, or gracefully summarize what you got. Ignoring this is how you ship broken AI features.
Request code
from anthropic import Anthropic
client = Anthropic()
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=50,
messages=[
{"role": "user", "content": "Explain quantum computing in detail."}
]
)
print(f"Stop reason: {response.stop_reason}")
print(f"Response text: {response.content[0].text}")
print(f"Input tokens: {response.usage.input_tokens}")
print(f"Output tokens: {response.usage.output_tokens}") Authentication
Set your Anthropic API key as an environment variable before running code: export ANTHROPIC_API_KEY="sk-ant-..." The Anthropic Python SDK reads this automatically at client instantiation time.
Response shape
| Field | Description |
|---|---|
stop_reason | String: 'end_turn', 'max_tokens', or 'stop_sequence' |
content | List of content blocks, typically one TextBlock with the generated text |
content[0].text | The actual response text from Claude |
usage.input_tokens | Count of tokens in your prompt |
usage.output_tokens | Count of tokens Claude generated |
Field guide
stop_reason The exit condition. Check this first in production code. 'end_turn' means safe to use. 'max_tokens' means response is incomplete and you should retry with higher max_tokens.
content[0].text The actual response: but only treat it as complete if stop_reason is 'end_turn'. If stop_reason is 'max_tokens', this text is cut off mid-thought.
usage.output_tokens The hidden insight: if output_tokens equals your max_tokens setting, you definitely hit the limit. The response is truncated. Use this to auto-detect incomplete responses programmatically.
Setup trap
The Anthropic SDK reads ANTHROPIC_API_KEY from environment at client instantiation. If you set it after creating the client, it won't work. Set the environment variable before running your script, or pass it explicitly: `client = Anthropic(api_key="sk-ant-...")`. Test with a simple request first to confirm auth is working before debugging stop_reason logic.
Cost
Requesting very high max_tokens (e.g., 100,000) on a model like Claude Opus will increase your bill even if stop_reason is 'end_turn' at token 1,000: you're still allocated and charged for all those tokens. Set max_tokens to a reasonable upper bound for your use case, not arbitrarily high. Output tokens are billed at roughly 3x the rate of input tokens for Opus, so incomplete responses from max_tokens hits are especially wasteful.
Common gotcha
Developers often check if response.content[0].text is non-empty and assume the response is complete. An incomplete response can still have 500+ characters of text: it's just truncated mid-sentence. Always check stop_reason, not the length of the text. The most common production bug: checking `if response.content[0].text:` instead of checking `if response.stop_reason == "end_turn"`.
Error recovery
AuthenticationErrorRateLimitErrorAPIConnectionErrorBadRequestErrorExperienced dev note
Here's the production pattern nobody teaches: wrap your response check in a decision tree. Don't just log stop_reason. If it's 'max_tokens', automatically re-run with double the max_tokens (up to a ceiling). If it's 'end_turn', cache that response as complete. If it's 'stop_sequence', you're using a custom stop sequence: this is actually good for structured output. The magic insight: stop_reason lets you build self-healing AI pipelines. Bad responses don't fail silently; they trigger controlled retries.
Check your understanding
You set max_tokens=100 and ask Claude 'Explain photosynthesis in one paragraph.' The response is 95 tokens long and stop_reason is 'max_tokens'. Why is Claude not finishing, and what should your code do?
Show answer hint
Claude is not refusing or confused: it's simply hitting your token budget. The response was cut off mid-word or mid-sentence. Your code should either retry with max_tokens=200, show the user a truncated preview with a 'continue' option, or log a warning. Never treat a max_tokens stop as 'complete.' The token count tells you nothing about semantic completeness.