Code Beginner easy · 4 min

tokenizer.batch_decode(): decoding outputs

What you will learn
Convert a batch of token IDs back into human-readable text using your tokenizer's batch_decode() method.

Why this matters

After your model generates token IDs, you need to convert them back to text that humans can read: batch_decode() handles this efficiently for multiple sequences at once, which is essential for inference pipelines.

Skip if: Don't use batch_decode() if you're only decoding a single sequence: use tokenizer.decode() (singular) instead, or if you're using a pipeline() object which handles decoding internally already.

Explanation

What it is: tokenizer.batch_decode() takes a batch (list or tensor) of token ID sequences and converts them back into text strings. It's the inverse operation of tokenization.

How it works: Each token ID is mapped back to its corresponding token string via the tokenizer's vocabulary. Special tokens like [CLS], [SEP], or padding tokens are handled according to parameters you pass in. The method processes all sequences in parallel, making it faster than decoding one-by-one.

When to use it: This is your standard tool after generation or model inference when you have multiple outputs you need to read. It's part of the normal inference loop: input text → tokenize → model → token IDs → batch_decode → human text.

Analogy

Think of tokenization as breaking a sentence into numbered Scrabble tiles, and batch_decode() as the reverse: you have a pile of numbered tiles from multiple words, and you look each number up in a dictionary to reconstruct the original sentences.

Code

python
from transformers import AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained('gpt2')

token_ids = torch.tensor([
    [464, 995, 318, 1802],
    [464, 1487, 468, 257]
])

print('Token IDs:')
print(token_ids)
print()

decoded = tokenizer.batch_decode(token_ids, skip_special_tokens=True)
print('Decoded text:')
for i, text in enumerate(decoded):
    print(f'  Sequence {i}: {text}')
print()

without_skip = tokenizer.batch_decode(token_ids, skip_special_tokens=False)
print('With special tokens included:')
for i, text in enumerate(without_skip):
    print(f'  Sequence {i}: {text}')
Output
Token IDs:
tensor([[ 464,  995,  318, 1802],
        [ 464, 1487,  468,  257]])

Decoded text:
  Sequence 0: This place is great
  Sequence 1: This story was a

With special tokens included:
  Sequence 0: This place is great
  Sequence 1: This story was a

What just happened?

We created a tokenizer for GPT-2, defined two sequences of token IDs as a tensor, and called batch_decode() twice. The first call with skip_special_tokens=True removed any padding or special tokens and returned a list of 2 strings. The second call with skip_special_tokens=False kept all tokens (though in this case, GPT-2 doesn't add special tokens by default, so the output is identical).

Common gotcha

The most common mistake is passing a single sequence (1D tensor) to batch_decode() and expecting a string back: you'll get a list with one string instead. batch_decode() always returns a list, even for a batch of size 1. If you have a single sequence, use tokenizer.decode() (singular) instead, or wrap your tensor in a list.

Error recovery

TypeError: expected string or bytes-like object
You passed token IDs that aren't integers or a tensor: ensure your token_ids are actual numeric token indices, not strings.
IndexError: index out of range
A token ID in your batch is larger than the tokenizer's vocabulary size. This usually means your model generated an invalid token ID. Check that token_ids are within range [0, len(tokenizer)].
Expected 2D tensor, got 1D
You passed a single sequence (1D tensor) instead of a batch (2D tensor). Either reshape it with .unsqueeze(0) or use the singular tokenizer.decode() method instead.

Experienced dev note

In transformers 5.5.x, batch_decode() is optimized for the new tensor handling: it gracefully accepts both PyTorch tensors and lists of lists. However, it's still faster to pre-allocate tensors than to decode lists. Also, in production, always use skip_special_tokens=True unless you specifically need to debug token-level structure: raw special tokens in output text confuse end users and downstream processing.

Check your understanding

You have a 2D tensor of shape [8, 32] containing token IDs from your model's output. Why can't you just call tokenizer.decode(token_ids[0]) in a loop instead of batch_decode()? What would break, and what's the performance difference?

Show answer hint

A correct answer explains that decode() expects 1D (a single sequence), so you'd have to loop over all 8 sequences manually, which is slower and less efficient than batch_decode() which is written to handle the batched operation in one call. The key insight is recognizing the difference between the singular and plural forms of the method.

VERSION In transformers < 4.30.0, batch_decode() had inconsistent handling of PyTorch tensors on different devices: always convert to CPU first. In 4.30.0+, this is handled automatically. transformers 5.5.x (April 2026) standardized this further and added better support for all tensor types.
NEXT

Next, learn how to pass custom parameters like <code>clean_up_tokenization_spaces</code> to batch_decode() to handle whitespace artifacts left by the tokenizer.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.