ValueError
transformers.tokenization_utils_base.ValueError
Stack trace
ValueError: Padding side mismatch: tokenizer.pad_token_side='left' but model expects 'right'. Please align tokenizer and model padding sides.
Why it happens
During fine-tuning, the tokenizer and model must agree on which side to apply padding tokens. If the tokenizer pads on the left but the model expects right padding (or vice versa), this conflict triggers a ValueError. This mismatch often happens when loading a tokenizer and model separately without syncing their padding configurations.
Detection
Check tokenizer.pad_token_side and model.config.pad_token_side before training; if they differ, a ValueError will be raised during batch encoding or model input preparation.
Causes & fixes
Tokenizer is set to pad on the left but the model expects right padding.
Set tokenizer.padding_side to match model.config.pad_token_side before encoding inputs.
Loading tokenizer and model from different pretrained checkpoints with inconsistent padding side defaults.
Load tokenizer and model from the same pretrained checkpoint or explicitly set padding_side on the tokenizer to match the model.
Custom tokenizer configuration overrides padding side to a value conflicting with the model.
Review and remove conflicting tokenizer config overrides or explicitly set padding_side to the model's expected value.
Code: broken vs fixed
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained('t5-small')
model = AutoModelForSeq2SeqLM.from_pretrained('t5-small')
# This will raise ValueError if padding sides conflict
inputs = tokenizer(['Hello world'], padding=True, return_tensors='pt') # Error here import os
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
os.environ['TRANSFORMERS_CACHE'] = '/tmp/transformers_cache'
tokenizer = AutoTokenizer.from_pretrained('t5-small')
model = AutoModelForSeq2SeqLM.from_pretrained('t5-small')
# Fix: align tokenizer padding side with model
if tokenizer.padding_side != model.config.pad_token_side:
tokenizer.padding_side = model.config.pad_token_side
inputs = tokenizer(['Hello world'], padding=True, return_tensors='pt')
print('Tokenization succeeded with aligned padding sides') Workaround
Catch the ValueError during tokenization, then programmatically set tokenizer.padding_side = model.config.pad_token_side and retry tokenization.
Prevention
Always load tokenizer and model from the same pretrained checkpoint and verify padding_side alignment before fine-tuning to avoid padding conflicts.