How to beginner · 3 min read

How to use tiktoken for OpenAI token counting

Q: How to use tiktoken for OpenAI token counting

Use the tiktoken Python library to tokenize text and count tokens for OpenAI models precisely. Load the appropriate encoding with tiktoken.get_encoding() or tiktoken.encoding_for_model(), then encode your text and count the tokens with len().

Quick answer

Use the tiktoken Python library to tokenize text and count tokens for OpenAI models precisely. Load the appropriate encoding with tiktoken.get_encoding() or tiktoken.encoding_for_model(), then encode your text and count the tokens with len().

PREREQUISITES

Python 3.8+
pip install tiktoken
Basic Python knowledge

Setup

Install the tiktoken library via pip to enable token counting compatible with OpenAI models.

bash

pip install tiktoken

output

Collecting tiktoken
  Downloading tiktoken-0.4.0-py3-none-any.whl (1.2 MB)
Installing collected packages: tiktoken
Successfully installed tiktoken-0.4.0

Step by step

This example shows how to count tokens for a text string using tiktoken with the gpt-4o encoding.

python

import tiktoken

# Choose encoding for your model
encoding = tiktoken.encoding_for_model("gpt-4o")

text = "Hello, how many tokens am I using?"

# Encode text to tokens
tokens = encoding.encode(text)

# Count tokens
print(f"Token count: {len(tokens)}")

output

Token count: 9

Common variations

Use tiktoken.get_encoding("cl100k_base") for the base encoding used by most OpenAI chat models.
Count tokens for chat messages by encoding the content strings individually or concatenated.
Use tiktoken.encoding_for_model() to automatically select the right encoding for your model.

python

import tiktoken

# Base encoding
base_encoding = tiktoken.get_encoding("cl100k_base")

text = "Hello, world!"
tokens = base_encoding.encode(text)
print(f"Base encoding tokens: {len(tokens)}")

# Encoding for a different model
encoding = tiktoken.encoding_for_model("gpt-4o-mini")
tokens = encoding.encode(text)
print(f"gpt-4o-mini tokens: {len(tokens)}")

output

Base encoding tokens: 4
gpt-4o-mini tokens: 4

Troubleshooting

If you get an error like KeyError when using encoding_for_model, update tiktoken to the latest version.
Token counts may differ slightly between models; always use the encoding matching your target model.
For chat completions, count tokens for all message parts (role, content) to estimate usage accurately.

✅

Key Takeaways

Use tiktoken.encoding_for_model() to get the correct tokenizer for your OpenAI model.
Count tokens by encoding text and measuring the length of the token list.
Keep tiktoken updated to support new models and encodings.
Token counting helps manage context window limits and estimate API usage costs.

Verified 2026-04 · gpt-4o, gpt-4o-mini

Verify ↗