High severity intermediate · Fix: 2-5 min

ValueError

transformers.tokenization_utils_base.ValueError

What this error means

This error occurs when the tokenizer's padding side conflicts with the model's expected padding side during fine-tuning, causing a ValueError.

Stack trace

traceback

ValueError: Padding side mismatch: tokenizer.pad_token_side='left' but model expects 'right'. Please align tokenizer and model padding sides.

QUICK FIX

Set tokenizer.padding_side = model.config.pad_token_side before encoding inputs to ensure padding side consistency.

Why it happens

During fine-tuning, the tokenizer and model must agree on which side to apply padding tokens. If the tokenizer pads on the left but the model expects right padding (or vice versa), this conflict triggers a ValueError. This mismatch often happens when loading a tokenizer and model separately without syncing their padding configurations.

Detection

Check tokenizer.pad_token_side and model.config.pad_token_side before training; if they differ, a ValueError will be raised during batch encoding or model input preparation.

Causes & fixes

Tokenizer is set to pad on the left but the model expects right padding.

✓ Fix

Set tokenizer.padding_side to match model.config.pad_token_side before encoding inputs.

Loading tokenizer and model from different pretrained checkpoints with inconsistent padding side defaults.

✓ Fix

Load tokenizer and model from the same pretrained checkpoint or explicitly set padding_side on the tokenizer to match the model.

Custom tokenizer configuration overrides padding side to a value conflicting with the model.

✓ Fix

Review and remove conflicting tokenizer config overrides or explicitly set padding_side to the model's expected value.

Code: broken vs fixed

Broken - triggers the error

python

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained('t5-small')
model = AutoModelForSeq2SeqLM.from_pretrained('t5-small')

# This will raise ValueError if padding sides conflict
inputs = tokenizer(['Hello world'], padding=True, return_tensors='pt')  # Error here

Fixed - works correctly

python

import os
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

os.environ['TRANSFORMERS_CACHE'] = '/tmp/transformers_cache'

tokenizer = AutoTokenizer.from_pretrained('t5-small')
model = AutoModelForSeq2SeqLM.from_pretrained('t5-small')

# Fix: align tokenizer padding side with model
if tokenizer.padding_side != model.config.pad_token_side:
    tokenizer.padding_side = model.config.pad_token_side

inputs = tokenizer(['Hello world'], padding=True, return_tensors='pt')
print('Tokenization succeeded with aligned padding sides')

Aligned tokenizer.padding_side with model.config.pad_token_side to prevent padding side mismatch errors during tokenization.

⚠

Workaround

Catch the ValueError during tokenization, then programmatically set tokenizer.padding_side = model.config.pad_token_side and retry tokenization.

✓

Prevention

Always load tokenizer and model from the same pretrained checkpoint and verify padding_side alignment before fine-tuning to avoid padding conflicts.

Python 3.9+ · transformers >=4.0.0 · tested on 4.30.0

Verified 2026-04

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.