How to beginner · 3 min read

How to use temperature and top_p in Hugging Face

Quick answer
In Hugging Face's text generation pipeline or model API, use the temperature parameter to control randomness (higher values produce more diverse outputs) and top_p (nucleus sampling) to limit token selection to a cumulative probability mass. Both parameters can be passed as arguments to the generate method or pipeline call to fine-tune output creativity.

PREREQUISITES

  • Python 3.8+
  • pip install transformers>=4.30.0
  • pip install torch (or tensorflow)
  • Basic knowledge of Hugging Face Transformers

Setup

Install the transformers library and a backend like torch or tensorflow. Import the necessary classes and load a pretrained text generation model and tokenizer.

bash
pip install transformers torch

Step by step

Use the generate method with temperature and top_p parameters to control output randomness and diversity. Lower temperature (e.g., 0.7) makes output more focused, while higher (e.g., 1.2) increases creativity. top_p filters tokens to a cumulative probability threshold (e.g., 0.9).

python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

input_text = "The future of AI is"
inputs = tokenizer(input_text, return_tensors="pt")

# Generate with temperature and top_p
outputs = model.generate(
    **inputs,
    max_new_tokens=50,
    do_sample=True,
    temperature=0.8,  # Controls randomness
    top_p=0.9          # Nucleus sampling
)

generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
output
The future of AI is rapidly evolving, with new breakthroughs in natural language processing and machine learning enabling unprecedented capabilities in automation and creativity.

Common variations

  • Use the Hugging Face pipeline for simpler calls with temperature and top_p as arguments.
  • Adjust temperature to 0 for deterministic output.
  • Combine top_k with top_p for hybrid sampling.
  • Use different models like gpt2-xl or EleutherAI/gpt-j-6B.
python
from transformers import pipeline

generator = pipeline("text-generation", model="gpt2")

result = generator(
    "The future of AI is",
    max_new_tokens=50,
    do_sample=True,
    temperature=1.0,
    top_p=0.85
)

print(result[0]['generated_text'])
output
The future of AI is bright and full of potential, with innovations that will transform industries and improve lives worldwide.

Troubleshooting

  • If output is repetitive or dull, increase temperature or decrease top_p to allow more diversity.
  • If output is nonsensical, lower temperature or increase top_p closer to 1.0.
  • Ensure do_sample=True to enable sampling; otherwise, temperature and top_p have no effect.
  • Check model compatibility; some models may not support sampling parameters.

Key Takeaways

  • Use temperature to control randomness: lower values yield focused text, higher values increase creativity.
  • Use top_p for nucleus sampling to limit token choices to a probability mass, balancing diversity and coherence.
  • Always set do_sample=True to activate sampling parameters like temperature and top_p.
  • The Hugging Face pipeline simplifies usage of these parameters for quick prototyping.
  • Adjust parameters iteratively to find the best balance for your specific text generation task.
Verified 2026-04 · gpt2, gpt2-xl, EleutherAI/gpt-j-6B
Verify ↗