Concept Beginner to Intermediate · 3 min read

What is top-p sampling in AI

Quick answer

Top-p sampling, also known as nucleus sampling, is a text generation method where the model selects the next token from the smallest set of tokens whose cumulative probability exceeds a threshold p. This technique balances randomness and coherence by dynamically limiting the candidate tokens to the most probable ones, improving output diversity over fixed top-k sampling.

Top-p sampling (nucleus sampling) is a probabilistic text generation technique that selects tokens from the smallest set whose cumulative probability exceeds a threshold p, enabling diverse yet coherent AI outputs.

How it works

Top-p sampling works by first sorting all possible next tokens by their predicted probability from highest to lowest. Then it accumulates these probabilities until their sum just exceeds the threshold p (e.g., 0.9). The model then samples the next token only from this subset, called the nucleus. This dynamic cutoff adapts to the shape of the probability distribution, allowing more tokens when the distribution is flat and fewer when it is peaked.

Think of it like choosing a playlist: instead of always picking from the top 10 songs (top-k), you pick from the smallest group of songs that together make up 90% of your listening time (top-p). This way, you get variety without including very unlikely songs.

Concrete example

Suppose a language model predicts the next token probabilities as follows:

python

tokens = ['the', 'a', 'an', 'this', 'that', 'some', 'any']
probabilities = [0.4, 0.25, 0.15, 0.1, 0.05, 0.03, 0.02]

# Sort tokens by probability (already sorted here)

# Calculate cumulative probabilities
cumulative_probs = []
cum_sum = 0
for p in probabilities:
    cum_sum += p
    cumulative_probs.append(cum_sum)

# Set top-p threshold
p_threshold = 0.9

# Find smallest set where cumulative probability > p_threshold
nucleus_tokens = []
for token, cum_prob in zip(tokens, cumulative_probs):
    nucleus_tokens.append(token)
    if cum_prob > p_threshold:
        break

print('Nucleus tokens:', nucleus_tokens)

output

Nucleus tokens: ['the', 'a', 'an', 'this']

When to use it

Use top-p sampling when you want a balance between creativity and coherence in generated text. It is ideal for open-ended generation tasks like story writing, dialogue, or brainstorming, where diversity is important but outputs should remain plausible.

Do not use top-p sampling when deterministic or highly precise outputs are required, such as code generation or factual question answering, where greedy or beam search decoding is preferred.

Key terms

Term	Definition
Top-p sampling	A decoding method selecting tokens from the smallest set whose cumulative probability exceeds p.
Nucleus	The subset of tokens considered for sampling in top-p sampling.
Probability distribution	The model's predicted likelihoods for each possible next token.
Threshold p	The cumulative probability cutoff parameter in top-p sampling, typically between 0 and 1.

✅

Key Takeaways

Top-p sampling dynamically limits token choices to maintain output diversity and coherence.
It adapts to the model's probability distribution shape, unlike fixed top-k sampling.
Use top-p sampling for creative text generation, not for deterministic tasks.

Verified 2026-04 · gpt-4o, claude-3-5-sonnet-20241022, gemini-1.5-pro

Verify ↗