How to use beam search in Hugging Face
Quick answer
Use the
generate() method from Hugging Face's transformers library with the num_beams parameter set to your desired beam width to enable beam search. For example, model.generate(input_ids, num_beams=5) performs beam search with 5 beams during text generation.PREREQUISITES
Python 3.8+pip install transformers>=4.30.0pip install torch (or tensorflow)Basic knowledge of Hugging Face Transformers
Setup
Install the transformers library and a backend like torch for PyTorch support. Set up your environment with the necessary packages.
pip install transformers torch Step by step
Load a pretrained model and tokenizer, encode your input text, and call generate() with num_beams to enable beam search. The output tokens are decoded back to text.
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
# Load model and tokenizer
model_name = "t5-small"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
# Encode input
input_text = "translate English to German: The house is wonderful."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids
# Generate with beam search
outputs = model.generate(input_ids, num_beams=5, early_stopping=True)
# Decode output
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result) output
Das Haus ist wunderbar.
Common variations
- Use
num_return_sequencesto get multiple beam search outputs. - Adjust
early_stoppingto control when generation stops. - Use
num_beams=1for greedy decoding (no beam search). - Apply beam search with other models like GPT-2 by setting
num_beamsingenerate().
outputs = model.generate(input_ids, num_beams=3, num_return_sequences=3, early_stopping=True)
for i, output in enumerate(outputs):
print(f"Output {i+1}:", tokenizer.decode(output, skip_special_tokens=True)) output
Output 1: Das Haus ist wunderbar. Output 2: Das Haus ist sehr schön. Output 3: Das Haus ist toll.
Troubleshooting
- If you get an error about missing backend, ensure
torchortensorflowis installed. - Beam search can be slower; reduce
num_beamsif performance is an issue. - For very long sequences, increase
max_lengthingenerate()to avoid truncation.
Key Takeaways
- Enable beam search in Hugging Face by setting
num_beamsin thegenerate()method. - Use
num_return_sequencesto retrieve multiple diverse outputs from beam search. - Install
transformersand a backend liketorchto run generation with beam search. - Adjust
early_stoppingandmax_lengthparameters to control generation behavior. - Beam search improves output quality but increases generation time proportional to
num_beams.