What is greedy decoding in LLMs
LLMs) where the model selects the highest probability token at each step without considering future tokens. It generates output quickly but can miss more coherent or diverse sequences compared to other decoding methods like beam search or sampling.How it works
Greedy decoding works by selecting the single token with the highest predicted probability at each step of text generation from an LLM. Imagine you are writing a sentence word by word and always pick the most likely next word without looking ahead. This is like taking the fastest route on a map without checking if a longer path might lead to a better destination.
This approach is simple and fast but can lead to repetitive or suboptimal text because it ignores alternative tokens that might produce better overall sequences.
Concrete example
Here is a Python example using the OpenAI SDK with gpt-4o to demonstrate greedy decoding by always picking the highest probability token at each step (simulated by setting temperature=0):
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Complete the sentence: The quick brown fox"}],
temperature=0 # Greedy decoding enforces deterministic highest-probability token selection
)
print(response.choices[0].message.content) The quick brown fox jumps over the lazy dog.
When to use it
Use greedy decoding when you need fast, deterministic, and reproducible text generation without randomness. It is suitable for tasks where the most likely output is preferred, such as simple completions or rule-based text generation.
Avoid greedy decoding when you want more diverse, creative, or contextually rich outputs, as it can get stuck in repetitive loops or miss better phrasing that requires exploring multiple token options (use beam search or sampling instead).
Key terms
| Term | Definition |
|---|---|
| Greedy decoding | A decoding method that picks the highest probability token at each step without backtracking. |
| Temperature | A parameter controlling randomness in token selection; 0 means deterministic (greedy). |
| Beam search | A decoding method that keeps multiple candidate sequences to find better overall outputs. |
| Sampling | A decoding method that randomly selects tokens based on their probabilities to increase diversity. |
Key Takeaways
- Greedy decoding always picks the most probable next token, making generation fast and deterministic.
- It can produce repetitive or suboptimal text because it ignores alternative token paths.
- Use greedy decoding for simple, predictable tasks but prefer beam search or sampling for richer outputs.