How to beginner · 3 min read

How to use beam search in Hugging Face

Q: How to use beam search in Hugging Face

Use the generate() method from Hugging Face's transformers library with the num_beams parameter set to your desired beam width to enable beam search. For example, model.generate(input_ids, num_beams=5) performs beam search with 5 beams during text generation.

Quick answer

Use the generate() method from Hugging Face's transformers library with the num_beams parameter set to your desired beam width to enable beam search. For example, model.generate(input_ids, num_beams=5) performs beam search with 5 beams during text generation.

PREREQUISITES

Python 3.8+
pip install transformers>=4.30.0
pip install torch (or tensorflow)
Basic knowledge of Hugging Face Transformers

Setup

Install the transformers library and a backend like torch for PyTorch support. Set up your environment with the necessary packages.

bash

pip install transformers torch

Step by step

Load a pretrained model and tokenizer, encode your input text, and call generate() with num_beams to enable beam search. The output tokens are decoded back to text.

python

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# Load model and tokenizer
model_name = "t5-small"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

# Encode input
input_text = "translate English to German: The house is wonderful."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

# Generate with beam search
outputs = model.generate(input_ids, num_beams=5, early_stopping=True)

# Decode output
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)

output

Das Haus ist wunderbar.

Common variations

Use num_return_sequences to get multiple beam search outputs.
Adjust early_stopping to control when generation stops.
Use num_beams=1 for greedy decoding (no beam search).
Apply beam search with other models like GPT-2 by setting num_beams in generate().

python

outputs = model.generate(input_ids, num_beams=3, num_return_sequences=3, early_stopping=True)
for i, output in enumerate(outputs):
    print(f"Output {i+1}:", tokenizer.decode(output, skip_special_tokens=True))

output

Output 1: Das Haus ist wunderbar.
Output 2: Das Haus ist sehr schön.
Output 3: Das Haus ist toll.

Troubleshooting

If you get an error about missing backend, ensure torch or tensorflow is installed.
Beam search can be slower; reduce num_beams if performance is an issue.
For very long sequences, increase max_length in generate() to avoid truncation.

✅

Key Takeaways

Enable beam search in Hugging Face by setting num_beams in the generate() method.
Use num_return_sequences to retrieve multiple diverse outputs from beam search.
Install transformers and a backend like torch to run generation with beam search.
Adjust early_stopping and max_length parameters to control generation behavior.
Beam search improves output quality but increases generation time proportional to num_beams.

Verified 2026-04 · t5-small, gpt-2

Verify ↗