High severity beginner · Fix: 2-5 min

ValueError

transformers.tokenization_utils_base.ValueError

What this error means
This error occurs when the tokenizer tries to pad sequences but no pad token is defined in the tokenizer configuration.

Stack trace

traceback
ValueError: This tokenizer does not have a pad token. Please set a pad token using tokenizer.pad_token = '<PAD>' or specify pad_token_id.
QUICK FIX
Set tokenizer.pad_token explicitly before padding, e.g., tokenizer.pad_token = '[PAD]' or tokenizer.pad_token = tokenizer.eos_token.

Why it happens

HuggingFace tokenizers require a pad token to pad input sequences to the same length. If the tokenizer model does not have a pad token set by default, attempts to pad sequences will raise this error. This often happens with models that do not include a pad token in their vocabulary or when the pad token is not explicitly assigned.

Detection

Check if tokenizer.pad_token is None before padding inputs, or catch ValueError during tokenization and log the tokenizer configuration to detect missing pad tokens early.

Causes & fixes

1

The tokenizer model does not have a default pad token set.

✓ Fix

Manually assign a pad token string to tokenizer.pad_token, e.g., tokenizer.pad_token = '[PAD]'.

2

Using a tokenizer for a model that inherently lacks a pad token (e.g., GPT-2).

✓ Fix

Set the pad token to the EOS token or another unused token: tokenizer.pad_token = tokenizer.eos_token.

3

Loading a tokenizer without specifying padding parameters or pad token id.

✓ Fix

Specify pad_token_id explicitly when calling tokenizer.pad or tokenizer.__call__ with padding.

Code: broken vs fixed

Broken - triggers the error
python
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained('gpt2')
inputs = tokenizer(['Hello', 'Hello world'], padding=True)  # This line raises ValueError
Fixed - works correctly
python
import os
from transformers import AutoTokenizer

# No hardcoded keys needed here

tokenizer = AutoTokenizer.from_pretrained('gpt2')
tokenizer.pad_token = tokenizer.eos_token  # Set pad token to eos token to fix error
inputs = tokenizer(['Hello', 'Hello world'], padding=True)
print(inputs)
Assigned the pad token to the tokenizer's eos_token to enable padding, preventing the ValueError when padding sequences.

Workaround

Catch the ValueError during tokenization and manually pad sequences by adding a pad token string to inputs before tokenization or use tokenizer.pad_token_id with a custom padding implementation.

Prevention

Always verify that the tokenizer has a pad token set before using padding. For models without a default pad token, explicitly assign one during tokenizer initialization or configuration.

Python 3.7+ · transformers >=4.0.0 · tested on 4.30.0
Verified 2026-04
Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.