Concept Beginner to Intermediate · 3 min read

What is top-p sampling in prompts

Quick answer
Top-p sampling, also called nucleus sampling, is a text generation method that chooses the next token from the smallest set of tokens whose cumulative probability exceeds a threshold p. It balances randomness and coherence by dynamically limiting the token pool to the most likely candidates during prompt completion.
Top-p sampling is a probabilistic text generation technique that selects tokens from the smallest set whose cumulative probability exceeds a threshold p, ensuring coherent yet diverse outputs.

How it works

Top-p sampling works by sorting the model's predicted next tokens by their probability and then selecting the smallest group of tokens whose combined probability is at least p (e.g., 0.9). The next token is then randomly sampled from this group. This method dynamically adapts the candidate pool size based on the distribution, unlike fixed top-k sampling.

Think of it like choosing from a menu where you only consider the most popular dishes that together make up 90% of all orders, ignoring the rare items to keep choices relevant but varied.

Concrete example

Below is a Python example using the OpenAI SDK gpt-4o model with top_p set to 0.9. This instructs the model to sample tokens from the nucleus covering 90% probability mass.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a creative story opening."}],
    top_p=0.9,
    max_tokens=50
)

print(response.choices[0].message.content)
output
Once upon a time, in a land where the skies shimmered with endless colors, a young explorer set out on a journey to discover hidden secrets.

When to use it

Use top-p sampling when you want a balance between creativity and coherence in generated text. It is ideal for creative writing, dialogue generation, and scenarios where diversity is important but you want to avoid unlikely or nonsensical tokens.

Avoid top-p when you need deterministic or highly precise outputs, such as code generation or factual answers, where greedy or beam search decoding is preferred.

Key terms

TermDefinition
Top-p samplingSampling method selecting tokens from the smallest set whose cumulative probability exceeds p.
Nucleus samplingAnother name for top-p sampling.
Token probabilityThe likelihood assigned by the model to each possible next token.
Cumulative probabilitySum of probabilities of tokens sorted from highest to lowest.

Key Takeaways

  • Top-p sampling dynamically limits token choices to the most probable subset covering threshold p.
  • It balances randomness and coherence better than fixed top-k sampling.
  • Use top-p for creative, diverse text generation but not for deterministic outputs.
  • Setting p closer to 1 increases diversity; lower p makes output more focused.
  • Supported in major APIs like OpenAI's gpt-4o via the top_p parameter.
Verified 2026-04 · gpt-4o
Verify ↗