How to beginner · 3 min read

How to set max new tokens in Hugging Face

Quick answer
In Hugging Face transformers, set the maximum number of new tokens generated by passing the max_new_tokens parameter to the generate() method of a model's pipeline or model object. For example, use model.generate(input_ids, max_new_tokens=50) to limit generation to 50 new tokens.

PREREQUISITES

  • Python 3.8+
  • pip install transformers>=4.30
  • pip install torch or tensorflow
  • Basic knowledge of Hugging Face transformers

Setup

Install the Hugging Face transformers library and a backend like torch or tensorflow. Set up your environment with the necessary packages.

bash
pip install transformers torch

Step by step

Use the generate() method with the max_new_tokens argument to control the length of generated text. This example uses a GPT-2 model to generate up to 50 new tokens.

python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load tokenizer and model
model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Encode input prompt
input_text = "The future of AI is"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

# Generate with max_new_tokens set
outputs = model.generate(input_ids, max_new_tokens=50)

# Decode and print output
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
output
The future of AI is expected to revolutionize many industries, including healthcare, finance, and education. With advancements in machine learning and natural language processing, AI systems will become more capable and accessible.

Common variations

  • Use max_length instead of max_new_tokens, but note max_length limits total tokens including input.
  • For pipelines, pass max_new_tokens directly to pipeline.generate().
  • Adjust generation parameters like temperature or top_p alongside max_new_tokens.
python
from transformers import pipeline

text_gen = pipeline("text-generation", model="gpt2")
result = text_gen("Hello world", max_new_tokens=30)
print(result[0]['generated_text'])
output
Hello world, this is an example of text generation using Hugging Face transformers with a limit on new tokens.

Troubleshooting

  • If generation is too short, increase max_new_tokens.
  • If you get errors about token limits, check model max context size and adjust inputs accordingly.
  • Ensure you use max_new_tokens with transformers version 4.27 or later.

Key Takeaways

  • Use max_new_tokens in generate() to limit only the newly generated tokens.
  • Avoid max_length if you want to exclude input tokens from the limit.
  • Pass max_new_tokens directly in Hugging Face pipelines for easy usage.
  • Check your transformers library version to ensure support for max_new_tokens.
  • Adjust other generation parameters alongside max_new_tokens for best results.
Verified 2026-04 · gpt2
Verify ↗