API Beginner easy · 4 min

stop_reason: why generation ended

What you will learn

The <code>stop_reason</code> field tells you whether Claude finished naturally, hit a token limit, or stopped for another reason: critical for production decision-making.

Why this matters

In production, you need to know if Claude actually completed the response or ran out of space. A truncated response treated as complete causes silent correctness bugs. <code>stop_reason</code> lets you detect incomplete responses and decide whether to re-prompt, truncate gracefully, or alert the user.

Skip if: If you're testing locally with tiny responses guaranteed to finish, you can skip checking <code>stop_reason</code>. But the moment you move to production or user-facing code, always check it. Never assume a response is complete.

Explanation

When Claude generates a response, the API returns a stop_reason field that explains why generation stopped. The most common values are "end_turn" (Claude finished naturally), "max_tokens" (hit the token limit you set), and "stop_sequence" (generation hit a stop sequence you defined). Think of it as the 'reason code' for why the conversation ended.

Under the hood, Claude generates tokens one at a time. The API checks after each token whether a stop condition has been met: either the model decided to end (end_turn), you ran out of max_tokens, or a stop sequence appeared. The stop_reason tells your code which condition triggered first. This is different from HTTP status codes; the request succeeds either way, but the response might be incomplete.

Always check stop_reason in production. If it's "max_tokens", your response is definitely truncated and you should either increase max_tokens for a retry, show a 'response was incomplete' message to the user, or gracefully summarize what you got. Ignoring this is how you ship broken AI features.

Request code

python

from anthropic import Anthropic

client = Anthropic()

response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=50,
    messages=[
        {"role": "user", "content": "Explain quantum computing in detail."}
    ]
)

print(f"Stop reason: {response.stop_reason}")
print(f"Response text: {response.content[0].text}")
print(f"Input tokens: {response.usage.input_tokens}")
print(f"Output tokens: {response.usage.output_tokens}")

Authentication

Set your Anthropic API key as an environment variable before running code: export ANTHROPIC_API_KEY="sk-ant-..." The Anthropic Python SDK reads this automatically at client instantiation time.

Response shape

Field	Description
`stop_reason`	String: 'end_turn', 'max_tokens', or 'stop_sequence'
`content`	List of content blocks, typically one TextBlock with the generated text
`content[0].text`	The actual response text from Claude
`usage.input_tokens`	Count of tokens in your prompt
`usage.output_tokens`	Count of tokens Claude generated

Field guide

stop_reason

The exit condition. Check this first in production code. 'end_turn' means safe to use. 'max_tokens' means response is incomplete and you should retry with higher max_tokens.

content[0].text

The actual response: but only treat it as complete if stop_reason is 'end_turn'. If stop_reason is 'max_tokens', this text is cut off mid-thought.

usage.output_tokens

The hidden insight: if output_tokens equals your max_tokens setting, you definitely hit the limit. The response is truncated. Use this to auto-detect incomplete responses programmatically.

Setup trap

The Anthropic SDK reads ANTHROPIC_API_KEY from environment at client instantiation. If you set it after creating the client, it won't work. Set the environment variable before running your script, or pass it explicitly: `client = Anthropic(api_key="sk-ant-...")`. Test with a simple request first to confirm auth is working before debugging stop_reason logic.

Cost

Requesting very high max_tokens (e.g., 100,000) on a model like Claude Opus will increase your bill even if stop_reason is 'end_turn' at token 1,000: you're still allocated and charged for all those tokens. Set max_tokens to a reasonable upper bound for your use case, not arbitrarily high. Output tokens are billed at roughly 3x the rate of input tokens for Opus, so incomplete responses from max_tokens hits are especially wasteful.

Common gotcha

Developers often check if response.content[0].text is non-empty and assume the response is complete. An incomplete response can still have 500+ characters of text: it's just truncated mid-sentence. Always check stop_reason, not the length of the text. The most common production bug: checking `if response.content[0].text:` instead of checking `if response.stop_reason == "end_turn"`.

Error recovery

AuthenticationError

API key is invalid, missing, or expired. Verify ANTHROPIC_API_KEY is set and correct. Check your Anthropic dashboard for key expiration.

RateLimitError

You've exceeded the per-minute or per-day token limit on your account tier. Implement exponential backoff and retry after 30 seconds. Consider upgrading your account.

APIConnectionError

Network failure or API outage. Implement retry logic with exponential backoff and check https://status.anthropic.com.

BadRequestError

Invalid parameters: usually max_tokens is negative or model name is wrong. Verify model is 'claude-opus-4-6' or 'claude-sonnet-4-6' and max_tokens is a positive integer.

Experienced dev note

Here's the production pattern nobody teaches: wrap your response check in a decision tree. Don't just log stop_reason. If it's 'max_tokens', automatically re-run with double the max_tokens (up to a ceiling). If it's 'end_turn', cache that response as complete. If it's 'stop_sequence', you're using a custom stop sequence: this is actually good for structured output. The magic insight: stop_reason lets you build self-healing AI pipelines. Bad responses don't fail silently; they trigger controlled retries.

Check your understanding

You set max_tokens=100 and ask Claude 'Explain photosynthesis in one paragraph.' The response is 95 tokens long and stop_reason is 'max_tokens'. Why is Claude not finishing, and what should your code do?

Show answer hint

Claude is not refusing or confused: it's simply hitting your token budget. The response was cut off mid-word or mid-sentence. Your code should either retry with max_tokens=200, show the user a truncated preview with a 'continue' option, or log a warning. Never treat a max_tokens stop as 'complete.' The token count tells you nothing about semantic completeness.

VERSION Anthropic SDK 0.94.x (April 2026) uses the modern messages API. Older deprecated patterns used HUMAN_PROMPT/AI_PROMPT constants; those are gone. Always use client.messages.create() with a messages list. The stop_reason field has been stable since Claude 3 launch and works identically across all Anthropic models.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.