High severity beginner · Fix: 2-5 min

ValueError

langchain.text_splitter.TokenTextSplitter.ValueError

What this error means
TokenTextSplitter throws a ValueError when the specified encoding name is not recognized or supported by the underlying tokenizer.

Stack trace

traceback
ValueError: Encoding 'cl100k_base' not found. Please check the encoding name or install the required tokenizer package.
  File ".../langchain/text_splitter.py", line 123, in __init__
    self.tokenizer = tiktoken.get_encoding(encoding_name)
  File ".../tiktoken/__init__.py", line 45, in get_encoding
    raise ValueError(f"Encoding '{name}' not found.")
QUICK FIX
Ensure the encoding parameter matches a valid tiktoken encoding name and install or upgrade tiktoken to the latest version.

Why it happens

TokenTextSplitter relies on the tiktoken library to tokenize text based on a specified encoding name. If the encoding name is misspelled, unsupported, or the tiktoken package is missing or outdated, it raises this ValueError. This usually happens when the encoding parameter is incorrect or the environment lacks the required tokenizer data.

Detection

Catch ValueError exceptions when initializing TokenTextSplitter or calling its methods, and log the encoding name used to identify unsupported or misspelled encodings before the app crashes.

Causes & fixes

1

The encoding name passed to TokenTextSplitter is misspelled or invalid.

✓ Fix

Verify and correct the encoding name string to a valid encoding supported by tiktoken, such as 'cl100k_base' or 'r50k_base'.

2

The tiktoken package is not installed or is outdated, missing the requested encoding.

✓ Fix

Install or upgrade tiktoken to the latest version using 'pip install -U tiktoken' to ensure all encodings are available.

3

Using a custom or unsupported encoding name not included in the tiktoken library.

✓ Fix

Switch to a standard encoding supported by tiktoken or implement a custom tokenizer compatible with TokenTextSplitter.

Code: broken vs fixed

Broken - triggers the error
python
from langchain.text_splitter import TokenTextSplitter

# This will raise ValueError if encoding is invalid
splitter = TokenTextSplitter(encoding_name='cl100k_base_wrong')  # triggers error
chunks = splitter.split_text("Some long text to split.")
Fixed - works correctly
python
from langchain.text_splitter import TokenTextSplitter

# Fixed: corrected encoding name
splitter = TokenTextSplitter(encoding_name='cl100k_base')  # corrected encoding
chunks = splitter.split_text("Some long text to split.")
print(chunks)
Corrected the encoding_name to a valid tiktoken encoding 'cl100k_base' so TokenTextSplitter can find and use the tokenizer without error.

Workaround

Wrap TokenTextSplitter initialization in try/except ValueError, and fallback to a simpler text splitter like CharacterTextSplitter if encoding is not found.

Prevention

Always verify encoding names against the tiktoken documentation and keep the tiktoken package updated to avoid missing encodings in chunking workflows.

Python 3.9+ · langchain >=0.1.0 · tested on 0.2.x
Verified 2026-04
Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.