Token counting with get_num_tokens()
Why this matters
LLM pricing is token-based, and context windows are finite. Counting tokens upfront lets you estimate cost, avoid quota overages, and decide whether to chunk or summarize input before API calls.
Explanation
get_num_tokens() is a method on LLM instances that tokenizes a string and returns the exact token count without making an API call. How it works: Each LLM has a tokenizer (usually BPE-based for GPT models, SentencePiece for others). You pass raw text, the tokenizer breaks it into tokens, and you get back an integer. When to use it: Before batching large documents, after building a prompt but before sending it to the API, or when implementing a token-aware retrieval strategy. The count is deterministic for the same model: useful for precalculation and testing.
Analogy
Think of tokens like postage: before mailing a stack of letters, you weigh them to know the cost. get_num_tokens() is the scale. You check the weight (token count) before committing to the shipment (API call).
Code
from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
llm_openai = ChatOpenAI(model="gpt-4o-mini", api_key="your-key-here")
text = "The quick brown fox jumps over the lazy dog. This is a test string to count tokens."
token_count_openai = llm_openai.get_num_tokens(text)
print(f"GPT-4o-mini tokens: {token_count_openai}")
llm_claude = ChatAnthropic(model="claude-3-5-sonnet-20241022")
token_count_claude = llm_claude.get_num_tokens(text)
print(f"Claude tokens: {token_count_claude}")
print(f"\nToken difference: {abs(token_count_openai - token_count_claude)} (models tokenize differently)")
long_text = " ".join([text] * 50)
long_count = llm_openai.get_num_tokens(long_text)
print(f"\n500-word passage: {long_count} tokens")
if long_count > 100000:
print("Would exceed context window — chunk before sending")
else:
print("Safe to send in single request") GPT-4o-mini tokens: 21 Claude tokens: 29 Token difference: 8 (models tokenize differently) 500-word passage: 1050 tokens Safe to send in single request
What just happened?
The code instantiated two different LLM clients (OpenAI and Anthropic), called get_num_tokens() on the same text string using each model's native tokenizer, and got back different counts because each model uses a different tokenization scheme. It then repeated the text 50 times and checked whether the resulting token count would fit in context, demonstrating the real use case: pre-flight checks before API calls.
Common gotcha
Token counts vary significantly between models. GPT-4o-mini and Claude 3.5 Sonnet tokenize the same text differently: sometimes by 20-30%. If you count tokens with one model and send to another, your calculations are wrong. Always count tokens using the exact LLM instance you'll invoke, not a different model.
Error recovery
AttributeError: 'ChatOpenAI' object has no attribute 'get_num_tokens'TypeError: get_num_tokens() missing 1 required positional argument: 'text'APIConnectionError or AuthenticationErrorExperienced dev note
get_num_tokens() is free: it tokenizes locally without hitting the API. This is crucial: you can call it thousands of times to do cost modeling, chunk size calculation, and batch optimization without spending money. Many engineers assume it's an API call and avoid using it. It's not. Use it liberally in preprocessing and validation logic.
Check your understanding
You're building a retrieval pipeline that chunks documents and sends them to GPT-4o-mini. You count tokens for a single chunk and get 8,000. Then you retrieve 15 similar chunks and want to send them all as context in one request. What would you check before making that API call, and why might your check fail if you used a different model to count tokens?
Show answer hint
A correct answer should mention: (1) checking total token count against GPT-4o-mini's context window (128K), (2) understanding that if you had counted tokens with Claude instead, the total would be different and your calculation would be invalid, (3) the need to always use the same model instance for counting as for invocation.