How to beginner · 3 min read

How to use temperature and top_p in Hugging Face

Quick answer

In Hugging Face's text generation pipeline or model API, use the temperature parameter to control randomness (higher values produce more diverse outputs) and top_p (nucleus sampling) to limit token selection to a cumulative probability mass. Both parameters can be passed as arguments to the generate method or pipeline call to fine-tune output creativity.

PREREQUISITES

Python 3.8+
pip install transformers>=4.30.0
pip install torch (or tensorflow)
Basic knowledge of Hugging Face Transformers

Setup

Install the transformers library and a backend like torch or tensorflow. Import the necessary classes and load a pretrained text generation model and tokenizer.

bash

pip install transformers torch

Step by step

Use the generate method with temperature and top_p parameters to control output randomness and diversity. Lower temperature (e.g., 0.7) makes output more focused, while higher (e.g., 1.2) increases creativity. top_p filters tokens to a cumulative probability threshold (e.g., 0.9).

python

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

input_text = "The future of AI is"
inputs = tokenizer(input_text, return_tensors="pt")

# Generate with temperature and top_p
outputs = model.generate(
    **inputs,
    max_new_tokens=50,
    do_sample=True,
    temperature=0.8,  # Controls randomness
    top_p=0.9          # Nucleus sampling
)

generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

output

The future of AI is rapidly evolving, with new breakthroughs in natural language processing and machine learning enabling unprecedented capabilities in automation and creativity.

Common variations

Use the Hugging Face pipeline for simpler calls with temperature and top_p as arguments.
Adjust temperature to 0 for deterministic output.
Combine top_k with top_p for hybrid sampling.
Use different models like gpt2-xl or EleutherAI/gpt-j-6B.

python

from transformers import pipeline

generator = pipeline("text-generation", model="gpt2")

result = generator(
    "The future of AI is",
    max_new_tokens=50,
    do_sample=True,
    temperature=1.0,
    top_p=0.85
)

print(result[0]['generated_text'])

output

The future of AI is bright and full of potential, with innovations that will transform industries and improve lives worldwide.

Troubleshooting

If output is repetitive or dull, increase temperature or decrease top_p to allow more diversity.
If output is nonsensical, lower temperature or increase top_p closer to 1.0.
Ensure do_sample=True to enable sampling; otherwise, temperature and top_p have no effect.
Check model compatibility; some models may not support sampling parameters.

✅

Key Takeaways

Use temperature to control randomness: lower values yield focused text, higher values increase creativity.
Use top_p for nucleus sampling to limit token choices to a probability mass, balancing diversity and coherence.
Always set do_sample=True to activate sampling parameters like temperature and top_p.
The Hugging Face pipeline simplifies usage of these parameters for quick prototyping.
Adjust parameters iteratively to find the best balance for your specific text generation task.

Verified 2026-04 · gpt2, gpt2-xl, EleutherAI/gpt-j-6B

Verify ↗