How to reduce token usage in Claude API calls
Quick answer
To reduce token usage in
Claude API calls, minimize prompt length by removing unnecessary context and use concise instructions. Also, limit max_tokens in the request and reuse conversation history selectively to avoid redundant tokens.PREREQUISITES
Python 3.8+Anthropic API keypip install anthropic>=0.20
Setup
Install the Anthropic Python SDK and set your API key as an environment variable to authenticate requests.
pip install anthropic>=0.20 Step by step
This example demonstrates how to reduce token usage by limiting max_tokens, trimming prompt content, and selectively including conversation history.
import os
import anthropic
client = anthropic.Anthropic()
# Minimal system prompt
system_prompt = "You are a helpful assistant."
# Concise user message
user_message = "Summarize the following text briefly: 'Anthropic develops AI models focused on safety and reliability.'"
# Create a short messages list to reduce tokens
messages = [{"role": "user", "content": user_message}]
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=100, # Limit max tokens to reduce output length
system=system_prompt,
messages=messages
)
print(response.content[0].text) output
Anthropic builds AI models that prioritize safety and reliability.
Common variations
You can further reduce tokens by:
- Using shorter system prompts or omitting them if context allows.
- Truncating or summarizing previous conversation history before sending.
- Adjusting
max_tokensto control response length. - Using smaller models like
claude-3-5-haiku-20241022for less verbose outputs.
Troubleshooting
If you receive errors about token limits, reduce the length of your prompt or lower max_tokens. If responses are too short, increment max_tokens gradually. Monitor token usage in your API dashboard to optimize further.
Key Takeaways
- Limit
max_tokensto control output length and reduce token consumption. - Keep prompts concise by removing unnecessary context and instructions.
- Reuse only essential conversation history to avoid redundant tokens.
- Choose smaller or more focused models for less verbose responses.
- Monitor token usage regularly to optimize your API calls.