Concept Beginner · 3 min read

What is greedy decoding in LLMs

Quick answer

Greedy decoding is a simple text generation method in large language models (LLMs) where the model selects the highest probability token at each step without considering future tokens. It generates output quickly but can miss more coherent or diverse sequences compared to other decoding methods like beam search or sampling.

Greedy decoding is a deterministic decoding strategy that generates text by always choosing the most probable next token at each step in a language model.

How it works

Greedy decoding works by selecting the single token with the highest predicted probability at each step of text generation from an LLM. Imagine you are writing a sentence word by word and always pick the most likely next word without looking ahead. This is like taking the fastest route on a map without checking if a longer path might lead to a better destination.

This approach is simple and fast but can lead to repetitive or suboptimal text because it ignores alternative tokens that might produce better overall sequences.

Concrete example

Here is a Python example using the OpenAI SDK with gpt-4o to demonstrate greedy decoding by always picking the highest probability token at each step (simulated by setting temperature=0):

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Complete the sentence: The quick brown fox"}],
    temperature=0  # Greedy decoding enforces deterministic highest-probability token selection
)

print(response.choices[0].message.content)

output

The quick brown fox jumps over the lazy dog.

When to use it

Use greedy decoding when you need fast, deterministic, and reproducible text generation without randomness. It is suitable for tasks where the most likely output is preferred, such as simple completions or rule-based text generation.

Avoid greedy decoding when you want more diverse, creative, or contextually rich outputs, as it can get stuck in repetitive loops or miss better phrasing that requires exploring multiple token options (use beam search or sampling instead).

Key terms

Term	Definition
Greedy decoding	A decoding method that picks the highest probability token at each step without backtracking.
Temperature	A parameter controlling randomness in token selection; 0 means deterministic (greedy).
Beam search	A decoding method that keeps multiple candidate sequences to find better overall outputs.
Sampling	A decoding method that randomly selects tokens based on their probabilities to increase diversity.

✅

Key Takeaways

Greedy decoding always picks the most probable next token, making generation fast and deterministic.
It can produce repetitive or suboptimal text because it ignores alternative token paths.
Use greedy decoding for simple, predictable tasks but prefer beam search or sampling for richer outputs.

Verified 2026-04 · gpt-4o

Verify ↗