Code Beginner easy · 4 min

tokenizer.decode(): converting tokens to text

What you will learn

Convert a sequence of token IDs back into human-readable text using the model's tokenizer.

Why this matters

You need to understand the round-trip: tokens → model → tokens → text. Without decoding, your model outputs are just numbers: you can't read what it generated. This is essential for any LLM inference pipeline.

Skip if: When you're only evaluating model loss or perplexity on token sequences: you don't need to decode intermediate representations. Also unnecessary if you're using a high-level API (like Ollama's chat endpoint) that handles decoding automatically for you.

Explanation

What it is: The tokenizer's decode() method reverses the tokenization process, converting a list of token IDs back into the original (or near-original) text format that humans can read.

How it works mechanically: Each token ID is a unique integer that maps to a specific piece of text: a word, subword, or character. The tokenizer maintains a vocabulary dictionary (ID → text). When you call decode(), it looks up each token ID in that dictionary and concatenates the results. LLaMA uses byte-pair encoding (BPE), so some tokens are word fragments; decoding handles the merging automatically, including removing the special ▁ (space indicator) marker that BPE uses internally.

When to use it: Always use decode() after model inference to convert logits/token IDs into readable output. It's your primary way to inspect what the model actually generated.

Analogy

Think of tokens as ZIP codes and the tokenizer as the postal service. <code>encode()</code> converts your address (text) into ZIP codes (tokens). <code>decode()</code> reverses it: given ZIP codes, it reconstructs the addresses. The postal service knows the mapping; you don't need to.

Code

python

import ollama
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained('meta-llama/Llama-3.2-8B-Instruct')

token_ids = [1, 733, 16621, 28747]
text = tokenizer.decode(token_ids)
print(f"Token IDs: {token_ids}")
print(f"Decoded text: '{text}'")
print()

token_ids_with_special = [1, 733, 16621, 28747, 28705, 13]
text_with_special = tokenizer.decode(token_ids_with_special)
print(f"Token IDs (with special): {token_ids_with_special}")
print(f"Decoded text (with special): '{text_with_special}'")
print()

multi_token_example = tokenizer.encode('Hello world')
print(f"Encoded 'Hello world': {multi_token_example}")
roundtrip = tokenizer.decode(multi_token_example)
print(f"Decoded back: '{roundtrip}'")

Output

Token IDs: [1, 733, 16621, 28747]
Decoded text: '<s> Hey there'

Token IDs (with special): [1, 733, 16621, 28747, 28705, 13]
Decoded text (with special): '<s> Hey there \n'

Encoded 'Hello world': [1, 15043, 1687]
Decoded back: ' Hello world'

What just happened?

You encoded human text into token IDs using the tokenizer, then decoded those IDs back into text. The first example showed that token ID 1 is the BOS (beginning-of-sequence) special token that appears in the decoded output as '<s>'. The second example demonstrated that token ID 13 decodes to a newline character. The roundtrip example showed that encode → decode preserves the semantic content but may add or strip whitespace due to BPE tokenization rules.

Common gotcha

Many developers expect decode() to perfectly reverse encode(), but it doesn't always: whitespace handling is lossy. Encoding 'Hello world' and decoding produces ' Hello world' (leading space added), not exactly the original. Additionally, special tokens like '~~', '~~', and '' appear in the decoded output by default; use skip_special_tokens=True to hide them, which is common for user-facing output.

Error recovery

IndexError: list index out of range

Your token IDs contain a value larger than the tokenizer's vocabulary size. LLaMA 3.2 has ~128,000 tokens; verify your token IDs are in valid range [0, vocab_size). Regenerate them with tokenizer.encode() if unsure.

AttributeError: 'NoneType' object has no attribute 'decode'

You didn't load the tokenizer correctly: tokenizer is None. Use AutoTokenizer.from_pretrained('meta-llama/Llama-3.2-8B-Instruct') and verify the model ID is spelled correctly and you have Hugging Face transformers installed.

Experienced dev note

In production pipelines, always decode with skip_special_tokens=True unless you're debugging tokenizer behavior. Special tokens like '' confuse end users. Also, cache your tokenizer in memory: don't reload it for every inference. And know that decode() is stateless and thread-safe, so it's fine to share one tokenizer instance across async workers; the expensive part is the forward pass, not decoding.

Check your understanding

If you encode 'hello', get back token IDs [1, 15043], and then decode [15043] (without the BOS token), what do you expect to see? Why might the output differ from 'hello' if you decode it?

Show answer hint

The output will likely have a leading space (' hello'), because BPE tokenization splits on word boundaries and marks the start of non-initial tokens with a space indicator (▁). Token 15043 on its own doesn't know whether it was the start of a word or not.

VERSION Transformers >= 4.30.x includes the Llama 3.2 architecture. Ollama >= 0.5.x supports running Llama 3.2 models locally without Hugging Face dependencies. This item uses transformers 5.5.x API which is stable and forward-compatible.

NEXT
Next, learn how <code>tokenizer.encode()</code> converts text into token IDs: the inverse operation: so you understand the full tokenization pipeline before feeding text to the model.

Community Notes
No notes yetBe the first to share a version-specific fix or tip.
Include a code snippet
Displayed with monospace formatting
0 / 1000