How to beginner · 3 min read

How to use beam search in Hugging Face

Quick answer
Use the generate() method from Hugging Face's transformers library with the num_beams parameter set to your desired beam width to enable beam search. For example, model.generate(input_ids, num_beams=5) performs beam search with 5 beams during text generation.

PREREQUISITES

  • Python 3.8+
  • pip install transformers>=4.30.0
  • pip install torch (or tensorflow)
  • Basic knowledge of Hugging Face Transformers

Setup

Install the transformers library and a backend like torch for PyTorch support. Set up your environment with the necessary packages.

bash
pip install transformers torch

Step by step

Load a pretrained model and tokenizer, encode your input text, and call generate() with num_beams to enable beam search. The output tokens are decoded back to text.

python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# Load model and tokenizer
model_name = "t5-small"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

# Encode input
input_text = "translate English to German: The house is wonderful."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

# Generate with beam search
outputs = model.generate(input_ids, num_beams=5, early_stopping=True)

# Decode output
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)
output
Das Haus ist wunderbar.

Common variations

  • Use num_return_sequences to get multiple beam search outputs.
  • Adjust early_stopping to control when generation stops.
  • Use num_beams=1 for greedy decoding (no beam search).
  • Apply beam search with other models like GPT-2 by setting num_beams in generate().
python
outputs = model.generate(input_ids, num_beams=3, num_return_sequences=3, early_stopping=True)
for i, output in enumerate(outputs):
    print(f"Output {i+1}:", tokenizer.decode(output, skip_special_tokens=True))
output
Output 1: Das Haus ist wunderbar.
Output 2: Das Haus ist sehr schön.
Output 3: Das Haus ist toll.

Troubleshooting

  • If you get an error about missing backend, ensure torch or tensorflow is installed.
  • Beam search can be slower; reduce num_beams if performance is an issue.
  • For very long sequences, increase max_length in generate() to avoid truncation.

Key Takeaways

  • Enable beam search in Hugging Face by setting num_beams in the generate() method.
  • Use num_return_sequences to retrieve multiple diverse outputs from beam search.
  • Install transformers and a backend like torch to run generation with beam search.
  • Adjust early_stopping and max_length parameters to control generation behavior.
  • Beam search improves output quality but increases generation time proportional to num_beams.
Verified 2026-04 · t5-small, gpt-2
Verify ↗