How to beginner · 3 min read

How to set max tokens in Claude API

Quick answer
Use the max_tokens parameter in the client.messages.create() method to limit the response length in the Claude API. Set max_tokens to an integer value representing the maximum tokens you want the model to generate.

PREREQUISITES

  • Python 3.8+
  • Anthropic API key
  • pip install anthropic>=0.20

Setup

Install the anthropic Python SDK and set your API key as an environment variable.

  • Install the SDK: pip install anthropic>=0.20
  • Set environment variable: export ANTHROPIC_API_KEY='your_api_key' (Linux/macOS) or set ANTHROPIC_API_KEY=your_api_key (Windows)
bash
pip install anthropic>=0.20

Step by step

Use the max_tokens parameter in the client.messages.create() call to control the maximum tokens generated by Claude. Below is a complete example.

python
import os
import anthropic

client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=100,
    system="You are a helpful assistant.",
    messages=[{"role": "user", "content": "Explain how to set max tokens in Claude API."}]
)

print(response.content[0].text)
output
Claude will respond with a concise explanation limited to approximately 100 tokens.

Common variations

You can adjust max_tokens to control response length or switch models by changing the model parameter. The SDK also supports asynchronous calls and streaming responses.

python
import asyncio
import os
import anthropic

async def main():
    client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
    response = await client.messages.acreate(
        model="claude-3-5-sonnet-20241022",
        max_tokens=50,
        system="You are a helpful assistant.",
        messages=[{"role": "user", "content": "Summarize the benefits of setting max tokens."}]
    )
    print(response.content[0].text)

asyncio.run(main())
output
A brief summary limited to 50 tokens is printed asynchronously.

Troubleshooting

If you receive incomplete or cut-off responses, increase max_tokens. If the response is too long, reduce max_tokens. Ensure your API key is set correctly in os.environ to avoid authentication errors.

Key Takeaways

  • Set max_tokens in client.messages.create() to limit response length in Claude API.
  • Use the latest anthropic SDK and pass your API key via os.environ.
  • Adjust max_tokens based on desired response verbosity and token limits.
  • Async calls and streaming are supported for advanced use cases.
  • Check environment variables and token limits if responses are truncated or errors occur.
Verified 2026-04 · claude-3-5-sonnet-20241022
Verify ↗