How to beginner · 3 min read

How to reduce token usage in Claude API calls

Q: How to reduce token usage in Claude API calls

To reduce token usage in Claude API calls, minimize prompt length by removing unnecessary context and use concise instructions. Also, limit max_tokens in the request and reuse conversation history selectively to avoid redundant tokens.

Quick answer

To reduce token usage in Claude API calls, minimize prompt length by removing unnecessary context and use concise instructions. Also, limit max_tokens in the request and reuse conversation history selectively to avoid redundant tokens.

PREREQUISITES

Python 3.8+
Anthropic API key
pip install anthropic>=0.20

Setup

Install the Anthropic Python SDK and set your API key as an environment variable to authenticate requests.

bash

pip install anthropic>=0.20

Step by step

This example demonstrates how to reduce token usage by limiting max_tokens, trimming prompt content, and selectively including conversation history.

python

import os
import anthropic

client = anthropic.Anthropic()

# Minimal system prompt
system_prompt = "You are a helpful assistant."

# Concise user message
user_message = "Summarize the following text briefly: 'Anthropic develops AI models focused on safety and reliability.'"

# Create a short messages list to reduce tokens
messages = [{"role": "user", "content": user_message}]

response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=100,  # Limit max tokens to reduce output length
    system=system_prompt,
    messages=messages
)

print(response.content[0].text)

output

Anthropic builds AI models that prioritize safety and reliability.

Common variations

You can further reduce tokens by:

Using shorter system prompts or omitting them if context allows.
Truncating or summarizing previous conversation history before sending.
Adjusting max_tokens to control response length.
Using smaller models like claude-3-5-haiku-20241022 for less verbose outputs.

Troubleshooting

If you receive errors about token limits, reduce the length of your prompt or lower max_tokens. If responses are too short, increment max_tokens gradually. Monitor token usage in your API dashboard to optimize further.

✅

Key Takeaways

Limit max_tokens to control output length and reduce token consumption.
Keep prompts concise by removing unnecessary context and instructions.
Reuse only essential conversation history to avoid redundant tokens.
Choose smaller or more focused models for less verbose responses.
Monitor token usage regularly to optimize your API calls.

Verified 2026-04 · claude-3-5-sonnet-20241022, claude-3-5-haiku-20241022

Verify ↗