How to beginner · 3 min read

How to use top_p in OpenAI API

Q: How to use top_p in OpenAI API

Use the top_p parameter in the OpenAI API to control nucleus sampling, which limits token selection to a cumulative probability mass. Set top_p between 0 and 1 in the chat.completions.create method to adjust output randomness alongside or instead of temperature.

Quick answer

Use the top_p parameter in the OpenAI API to control nucleus sampling, which limits token selection to a cumulative probability mass. Set top_p between 0 and 1 in the chat.completions.create method to adjust output randomness alongside or instead of temperature.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the official OpenAI Python SDK and set your API key as an environment variable.

Install SDK: pip install openai
Set environment variable in your shell:
export OPENAI_API_KEY='your_api_key_here' (Linux/macOS)
setx OPENAI_API_KEY "your_api_key_here" (Windows)

bash

pip install openai

Step by step

This example shows how to use the top_p parameter with the gpt-4o model to generate a chat completion. top_p is set to 0.8 to limit token sampling to the top 80% probability mass, controlling randomness.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a creative short story about a robot."}],
    top_p=0.8
)

print(response.choices[0].message.content)

output

Once upon a time, in a world where robots dreamed, there was one who wished to paint the stars...

Common variations

You can combine top_p with temperature for nuanced randomness control. Lower top_p values focus on high-probability tokens, while higher values allow more diversity. You can also use top_p with other models like gpt-4o-mini or in streaming mode.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Using top_p with temperature
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Explain quantum computing in simple terms."}],
    top_p=0.9,
    temperature=0.7
)

print(response.choices[0].message.content)

output

Quantum computing uses quantum bits, or qubits, which can be both 0 and 1 at the same time, allowing computers to solve certain problems much faster than classical computers.

Troubleshooting

If your completions are too random or too repetitive, adjust top_p and temperature values. Values close to 1.0 allow more randomness; values near 0 make output deterministic. Also, ensure your API key is correctly set in os.environ["OPENAI_API_KEY"] to avoid authentication errors.

✅

Key Takeaways

Use top_p to control output randomness by limiting token sampling to a cumulative probability mass.
Combine top_p with temperature for fine-grained control over creativity and coherence.
Set top_p between 0 and 1; lower values produce more focused, deterministic output.
Always use the OpenAI SDK v1+ pattern with os.environ for API keys to avoid security risks.
Test different top_p values to find the best balance for your specific use case.

Verified 2026-04 · gpt-4o, gpt-4o-mini

Verify ↗