Code Advanced medium · 5 min

Token counting with get_num_tokens()

What you will learn

Count tokens in text before sending to an LLM to predict cost and enforce context window limits.

Why this matters

LLM pricing is token-based, and context windows are finite. Counting tokens upfront lets you estimate cost, avoid quota overages, and decide whether to chunk or summarize input before API calls.

Skip if: Don't use get_num_tokens() if you're working with a local, quantized model with no token pricing: the count is accurate but irrelevant to your cost model. Also skip it if you're building a prototype and haven't reached production scale where cost tracking matters.

Explanation

get_num_tokens() is a method on LLM instances that tokenizes a string and returns the exact token count without making an API call. How it works: Each LLM has a tokenizer (usually BPE-based for GPT models, SentencePiece for others). You pass raw text, the tokenizer breaks it into tokens, and you get back an integer. When to use it: Before batching large documents, after building a prompt but before sending it to the API, or when implementing a token-aware retrieval strategy. The count is deterministic for the same model: useful for precalculation and testing.

Analogy

Think of tokens like postage: before mailing a stack of letters, you weigh them to know the cost. get_num_tokens() is the scale. You check the weight (token count) before committing to the shipment (API call).

Code

Illustrative only - not runnable without a valid API key

python

from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic

llm_openai = ChatOpenAI(model="gpt-4o-mini", api_key="your-key-here")
text = "The quick brown fox jumps over the lazy dog. This is a test string to count tokens."

token_count_openai = llm_openai.get_num_tokens(text)
print(f"GPT-4o-mini tokens: {token_count_openai}")

llm_claude = ChatAnthropic(model="claude-3-5-sonnet-20241022")
token_count_claude = llm_claude.get_num_tokens(text)
print(f"Claude tokens: {token_count_claude}")

print(f"\nToken difference: {abs(token_count_openai - token_count_claude)} (models tokenize differently)")

long_text = " ".join([text] * 50)
long_count = llm_openai.get_num_tokens(long_text)
print(f"\n500-word passage: {long_count} tokens")

if long_count > 100000:
    print("Would exceed context window — chunk before sending")
else:
    print("Safe to send in single request")

Output

GPT-4o-mini tokens: 21
Claude tokens: 29

Token difference: 8 (models tokenize differently)

500-word passage: 1050 tokens
Safe to send in single request

What just happened?

The code instantiated two different LLM clients (OpenAI and Anthropic), called get_num_tokens() on the same text string using each model's native tokenizer, and got back different counts because each model uses a different tokenization scheme. It then repeated the text 50 times and checked whether the resulting token count would fit in context, demonstrating the real use case: pre-flight checks before API calls.

Common gotcha

Token counts vary significantly between models. GPT-4o-mini and Claude 3.5 Sonnet tokenize the same text differently: sometimes by 20-30%. If you count tokens with one model and send to another, your calculations are wrong. Always count tokens using the exact LLM instance you'll invoke, not a different model.

Error recovery

AttributeError: 'ChatOpenAI' object has no attribute 'get_num_tokens'

You're using an older langchain version (< 0.1.0) or an LLM that doesn't support token counting (e.g., local Ollama). Upgrade langchain to 1.2.x or use an API-based model like ChatOpenAI or ChatAnthropic.

TypeError: get_num_tokens() missing 1 required positional argument: 'text'

You called get_num_tokens() with no argument. Pass the string to count: llm.get_num_tokens("your text here")

APIConnectionError or AuthenticationError

Your API key is missing, expired, or invalid. Pass api_key="..." to the LLM constructor or set the environment variable (e.g., OPENAI_API_KEY for OpenAI).

Experienced dev note

get_num_tokens() is free: it tokenizes locally without hitting the API. This is crucial: you can call it thousands of times to do cost modeling, chunk size calculation, and batch optimization without spending money. Many engineers assume it's an API call and avoid using it. It's not. Use it liberally in preprocessing and validation logic.

Check your understanding

You're building a retrieval pipeline that chunks documents and sends them to GPT-4o-mini. You count tokens for a single chunk and get 8,000. Then you retrieve 15 similar chunks and want to send them all as context in one request. What would you check before making that API call, and why might your check fail if you used a different model to count tokens?

Show answer hint

A correct answer should mention: (1) checking total token count against GPT-4o-mini's context window (128K), (2) understanding that if you had counted tokens with Claude instead, the total would be different and your calculation would be invalid, (3) the need to always use the same model instance for counting as for invocation.

VERSION langchain >= 0.1.0. In langchain 0.0.x, get_num_tokens() was not consistently available across all LLM types. langchain-core 0.3.x stabilized the method signature. If upgrading from < 0.1.0, ensure you're using the new import structure: from langchain_openai import ChatOpenAI, not from langchain.chat_models.

Once you know token counts, learn to stream responses with .stream() to handle large outputs efficiently without loading all tokens into memory at once.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.