Concept Intermediate · 3 min read

What is beam search in language models

Q: What is beam search in language models

Beam search is a heuristic search algorithm used in language models to generate sequences by keeping track of the top k most probable candidates at each step, rather than just the single most likely token. This balances between greedy decoding and exhaustive search, improving output quality without excessive computation.

Quick answer

Beam search is a heuristic search algorithm used in language models to generate sequences by keeping track of the top k most probable candidates at each step, rather than just the single most likely token. This balances between greedy decoding and exhaustive search, improving output quality without excessive computation.

Beam search is a heuristic decoding algorithm that explores multiple candidate sequences in parallel to find the most likely output from a language model.

How it works

Beam search works by maintaining a fixed number (beam width) of the most probable partial sequences at each step of generation. Instead of choosing only the single best next token (greedy search), it expands all candidates in the beam and keeps the top k sequences based on cumulative probability. This is like exploring multiple paths in a maze simultaneously, pruning less promising routes early to efficiently find the best overall path.

Concrete example

Suppose a language model predicts the next token probabilities at a step as follows:

python

beam_width = 2

# Current beams with their cumulative log probabilities
beams = [(['I'], -0.1), (['You'], -0.2)]

# Next token probabilities for each beam (token: log prob)
next_tokens = {
    'I': {'am': -0.3, 'like': -1.0, 'love': -0.5},
    'You': {'are': -0.2, 'like': -0.4, 'love': -1.2}
}

# Expand beams
all_candidates = []
for seq, score in beams:
    last_word = seq[-1]
    for token, token_score in next_tokens[last_word].items():
        candidate = (seq + [token], score + token_score)
        all_candidates.append(candidate)

# Select top beam_width candidates
all_candidates.sort(key=lambda x: x[1], reverse=True)  # higher log prob better
beams = all_candidates[:beam_width]

for seq, score in beams:
    print(f"Sequence: {' '.join(seq)}, Score: {score}")

output

Sequence: You are, Score: -0.4
Sequence: I am, Score: -0.4

When to use it

Use beam search when you want higher-quality generated text than greedy decoding but cannot afford the computational cost of exhaustive search. It is ideal for tasks like machine translation, summarization, and dialogue generation where multiple plausible outputs exist. Avoid beam search if you need very fast, low-latency generation or if diversity in outputs is more important than likelihood.

Key terms

Term	Definition
Beam width	Number of candidate sequences kept at each step during beam search.
Greedy search	Decoding method selecting the single most probable token at each step.
Heuristic search	An approach that explores a subset of possibilities to find a good solution efficiently.
Cumulative probability	The combined probability of a sequence of tokens, often in log space for numerical stability.

✅

Key Takeaways

Beam search balances quality and efficiency by exploring multiple candidate sequences simultaneously.
Increasing beam width improves output quality but increases computation cost exponentially.
Use beam search for tasks requiring coherent, high-probability text generation like translation or summarization.

Verified 2026-04 · gpt-4o, claude-3-5-sonnet-20241022

Verify ↗