Code Beginner easy · 4 min

Padding and truncation: max_length

What you will learn

Control how tokenizers handle sequences shorter or longer than your model expects by setting max_length and padding/truncation flags.

Why this matters

Transformer models require fixed-size input tensors. Real-world text varies wildly in length: you need to standardize it or your model will crash or produce poor results. This is a prerequisite skill before running any inference or fine-tuning.

Skip if: Do NOT use truncation if you're working with summarization or question-answering tasks where cutting off text loses critical information: in those cases, chunk your input and process pieces separately. Do NOT use padding if you're building a streaming/real-time system where latency matters more than batch size, since padding adds compute.

Explanation

Padding and truncation are tokenizer operations that force variable-length text into a fixed sequence length. When you set max_length=512, the tokenizer will either add padding tokens (typically [PAD]) to sequences shorter than 512 tokens, or remove tokens from the end of sequences longer than 512 tokens. Mechanically: the tokenizer converts text → tokens → applies truncation (keeps first N tokens, discards the rest) or padding (appends [PAD] tokens). You control this with padding='max_length', truncation=True, and max_length=N parameters. When to use: Always set max_length and truncation=True during inference to match your model's training setup. Use padding='max_length' in batch processing so all sequences in a batch are identical length; use padding='longest' only when processing single examples or when you want to save compute by not padding to a fixed limit.

Analogy

Think of <code>max_length</code> like a standardized form with a fixed number of lines. If your application is short, you fill empty lines with nothing (padding). If your application is too long, you cut it off mid-sentence (truncation). Your form processor doesn't care which case applies: it just expects a form of standard size.

Code

python

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')

text_short = 'Hello world'
text_long = 'The quick brown fox jumps over the lazy dog ' * 30

result_short = tokenizer(
    text_short,
    max_length=20,
    padding='max_length',
    truncation=True,
    return_tensors='pt'
)

result_long = tokenizer(
    text_long,
    max_length=20,
    padding='max_length',
    truncation=True,
    return_tensors='pt'
)

print('Short text token count:', result_short['input_ids'].shape[1])
print('Short text:', result_short['input_ids'])
print('\nLong text token count:', result_long['input_ids'].shape[1])
print('Long text (first 20 tokens):', result_long['input_ids'])
print('\nAttention mask (short):', result_short['attention_mask'])
print('Attention mask (long):', result_long['attention_mask'])

Output

Short text token count: 20
Short text: tensor([[ 101, 7592, 2088, 102,  103,  103,  103,  103,  103,  103,  103,  103,
         103,  103,  103,  103,  103,  103,  103,  103]])

Long text token count: 20
Long text (first 20 tokens): tensor([[ 101, 1996, 3613, 2882, 4419, 8814, 2058, 1996, 13971, 3899, 1996,
         3613, 2882, 4419, 8814, 2058, 1996, 13971, 3899, 102]])

Attention mask (short): tensor([[1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
Attention mask (long): tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])

What just happened?

The tokenizer converted two strings (one short, one long) into fixed-length token sequences of 20 tokens each. The short text was padded with token ID 103 ([PAD]) after the sentence-end token (102). The long text was truncated to exactly 20 tokens, losing everything after 'the lazy dog' repeats. The <code>attention_mask</code> shows which positions are real tokens (1) versus padding (0), telling the model to ignore padded positions during computation.

Common gotcha

Developers often forget to set attention_mask in the model call. If you set padding='max_length' but don't pass attention_mask to your model, the model will treat [PAD] tokens as real input, corrupting your results. Always pass both input_ids and attention_mask from the tokenizer output to the model.

Error recovery

RuntimeError: Expected input batch_size (1) to match the input provided to the model (8)

You're mixing batch sizes. Either tokenize multiple texts at once as a list, or ensure every text goes through the tokenizer the same way. Use tokenizer(texts, ...) for batches, not a loop.

Token indices sequence length is longer than the maximum (2048 > 512)

You set <code>truncation=False</code> (or forgot it), and your text exceeds max_length. Add <code>truncation=True</code> to your tokenizer call.

ValueError: Padding must be truncated to a multiple of 8

Some models (quantized ones) require sequence lengths to be multiples of 8. Set <code>max_length=512</code> instead of 500, or use a quantization-aware tokenizer config.

Experienced dev note

In transformers 5.5.x, the tokenizer returns attention_mask automatically: you don't opt-in. But many developers still loop through datasets without batching, tokenizing one example at a time. This defeats the purpose of padding: your GPU sits idle. Always tokenize in batches: tokenizer([text1, text2, ...], padding='max_length', ...). Also: never set max_length larger than your model was trained on (e.g., BERT with max_length=2048 will fail: it was trained on 512). Check your model card.

Check your understanding

You tokenize a batch of 8 texts with max_length=128, padding='max_length', truncation=True. Three of them have only 50 tokens after truncation. What does the attention_mask look like for those three, and why does the model need it?

Show answer hint

A correct answer explains that the attention_mask for short texts has 0s in positions 50-128 (the padded region), and the model uses this to ignore [PAD] tokens during attention computation so they don't influence the learned representation.

VERSION In transformers < 4.30.0, you had to manually construct padding and truncation as separate steps. In 4.30.0+, including 5.5.x, all of this happens in a single tokenizer call. Also, in 5.5.x, the default for `return_attention_mask` changed to `True`: it's no longer opt-in.

Next, learn about special tokens and token type IDs: understanding how <code>[CLS]</code>, <code>[SEP]</code>, and segment IDs affect model input will deepen your control over what the tokenizer produces.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.