ValueError
tiktoken.core.EncodingError
Stack trace
ValueError: token count mismatch: expected 4097 but got 4105
File "main.py", line 42, in generate_response
tokens = encoding.encode(prompt)
File "tiktoken/core.py", line 123, in encode
raise EncodingError("token count mismatch")
tiktoken.core.EncodingError: token count mismatch Why it happens
This error occurs when the token count calculated by tiktoken for a given prompt or input text differs from the expected token count used to manage the model's context window. Causes include using an incorrect encoding for the model, changes in the tokenizer version, or misalignment between prompt construction and token counting logic.
Detection
Log the token count returned by tiktoken's encode method and compare it against the expected token count before sending requests to the model to catch mismatches early.
Causes & fixes
Using the wrong tiktoken encoding for the model (e.g., encoding for gpt-3.5-turbo instead of gpt-4o)
Use the correct encoding by calling tiktoken.encoding_for_model with the exact model name you are using.
Manually counting tokens without accounting for special tokens or prompt formatting
Always use tiktoken's encode method on the full prompt string as sent to the model, including system and user messages.
Mismatch between prompt construction and token counting logic (e.g., counting tokens before adding stop sequences or suffixes)
Count tokens after fully constructing the prompt exactly as it will be sent to the API.
Code: broken vs fixed
import tiktoken
model = "gpt-4o"
encoding = tiktoken.get_encoding("gpt2") # Wrong encoding
prompt = "Hello, world!"
tokens = encoding.encode(prompt) # Causes token count mismatch error
print(f"Token count: {len(tokens)}") import os
import tiktoken
model = "gpt-4o"
encoding = tiktoken.encoding_for_model(model) # Fixed: use correct encoding for model
prompt = "Hello, world!"
tokens = encoding.encode(prompt) # Correct token count
print(f"Token count: {len(tokens)}") Workaround
If you cannot fix the encoding immediately, catch the ValueError and fallback to a manual token count approximation or truncate the prompt conservatively to avoid overflow.
Prevention
Always use tiktoken.encoding_for_model with the exact model name and count tokens on the fully constructed prompt string before sending to the API to prevent mismatches.