How to beginner · 3 min read

How to set max new tokens in Hugging Face

Quick answer

In Hugging Face transformers, set the maximum number of new tokens generated by passing the max_new_tokens parameter to the generate() method of a model's pipeline or model object. For example, use model.generate(input_ids, max_new_tokens=50) to limit generation to 50 new tokens.

PREREQUISITES

Python 3.8+
pip install transformers>=4.30
pip install torch or tensorflow
Basic knowledge of Hugging Face transformers

Setup

Install the Hugging Face transformers library and a backend like torch or tensorflow. Set up your environment with the necessary packages.

bash

pip install transformers torch

Step by step

Use the generate() method with the max_new_tokens argument to control the length of generated text. This example uses a GPT-2 model to generate up to 50 new tokens.

python

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load tokenizer and model
model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Encode input prompt
input_text = "The future of AI is"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

# Generate with max_new_tokens set
outputs = model.generate(input_ids, max_new_tokens=50)

# Decode and print output
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

output

The future of AI is expected to revolutionize many industries, including healthcare, finance, and education. With advancements in machine learning and natural language processing, AI systems will become more capable and accessible.

Common variations

Use max_length instead of max_new_tokens, but note max_length limits total tokens including input.
For pipelines, pass max_new_tokens directly to pipeline.generate().
Adjust generation parameters like temperature or top_p alongside max_new_tokens.

python

from transformers import pipeline

text_gen = pipeline("text-generation", model="gpt2")
result = text_gen("Hello world", max_new_tokens=30)
print(result[0]['generated_text'])

output

Hello world, this is an example of text generation using Hugging Face transformers with a limit on new tokens.

Troubleshooting

If generation is too short, increase max_new_tokens.
If you get errors about token limits, check model max context size and adjust inputs accordingly.
Ensure you use max_new_tokens with transformers version 4.27 or later.

✅

Key Takeaways

Use max_new_tokens in generate() to limit only the newly generated tokens.
Avoid max_length if you want to exclude input tokens from the limit.
Pass max_new_tokens directly in Hugging Face pipelines for easy usage.
Check your transformers library version to ensure support for max_new_tokens.
Adjust other generation parameters alongside max_new_tokens for best results.

Verified 2026-04 · gpt2

Verify ↗