How to use tiktoken for OpenAI token counting
Quick answer
Use the
tiktoken Python library to tokenize text and count tokens for OpenAI models precisely. Load the appropriate encoding with tiktoken.get_encoding() or tiktoken.encoding_for_model(), then encode your text and count the tokens with len().PREREQUISITES
Python 3.8+pip install tiktokenBasic Python knowledge
Setup
Install the tiktoken library via pip to enable token counting compatible with OpenAI models.
pip install tiktoken output
Collecting tiktoken Downloading tiktoken-0.4.0-py3-none-any.whl (1.2 MB) Installing collected packages: tiktoken Successfully installed tiktoken-0.4.0
Step by step
This example shows how to count tokens for a text string using tiktoken with the gpt-4o encoding.
import tiktoken
# Choose encoding for your model
encoding = tiktoken.encoding_for_model("gpt-4o")
text = "Hello, how many tokens am I using?"
# Encode text to tokens
tokens = encoding.encode(text)
# Count tokens
print(f"Token count: {len(tokens)}") output
Token count: 9
Common variations
- Use
tiktoken.get_encoding("cl100k_base")for the base encoding used by most OpenAI chat models. - Count tokens for chat messages by encoding the
contentstrings individually or concatenated. - Use
tiktoken.encoding_for_model()to automatically select the right encoding for your model.
import tiktoken
# Base encoding
base_encoding = tiktoken.get_encoding("cl100k_base")
text = "Hello, world!"
tokens = base_encoding.encode(text)
print(f"Base encoding tokens: {len(tokens)}")
# Encoding for a different model
encoding = tiktoken.encoding_for_model("gpt-4o-mini")
tokens = encoding.encode(text)
print(f"gpt-4o-mini tokens: {len(tokens)}") output
Base encoding tokens: 4 gpt-4o-mini tokens: 4
Troubleshooting
- If you get an error like
KeyErrorwhen usingencoding_for_model, updatetiktokento the latest version. - Token counts may differ slightly between models; always use the encoding matching your target model.
- For chat completions, count tokens for all message parts (role, content) to estimate usage accurately.
Key Takeaways
- Use
tiktoken.encoding_for_model()to get the correct tokenizer for your OpenAI model. - Count tokens by encoding text and measuring the length of the token list.
- Keep
tiktokenupdated to support new models and encodings. - Token counting helps manage context window limits and estimate API usage costs.