How to use summarization pipeline Hugging Face
Quick answer
Use the Hugging Face
transformers library's pipeline function with the task set to summarization. Load a pretrained model like facebook/bart-large-cnn and pass your text to get a concise summary.PREREQUISITES
Python 3.8+pip install transformers>=4.0.0pip install torch (or tensorflow)Internet connection to download pretrained models
Setup
Install the transformers library and a backend like torch or tensorflow. Set up your Python environment to run the summarization pipeline.
pip install transformers torch Step by step
Use the pipeline API from transformers to create a summarization pipeline with a pretrained model. Pass your input text to get a summary.
from transformers import pipeline
# Initialize summarization pipeline with a pretrained model
summarizer = pipeline('summarization', model='facebook/bart-large-cnn')
# Input text to summarize
text = ("The Hugging Face Transformers library provides thousands of pretrained models to perform tasks on texts such as classification, information extraction, question answering, summarization, translation, and more.")
# Generate summary
summary = summarizer(text, max_length=50, min_length=25, do_sample=False)
print('Summary:', summary[0]['summary_text']) output
Summary: The Hugging Face Transformers library provides pretrained models to perform tasks on texts such as classification, information extraction, question answering, summarization, translation, and more.
Common variations
- Use different models like
t5-smallorgoogle/pegasus-xsumfor summarization. - Adjust
max_lengthandmin_lengthparameters to control summary length. - Use the pipeline asynchronously with
asynciofor batch processing.
from transformers import pipeline
import asyncio
async def async_summarize(text):
summarizer = pipeline('summarization', model='facebook/bart-large-cnn')
# Run summarization asynchronously
summary = await asyncio.to_thread(summarizer, text, max_length=60, min_length=30, do_sample=False)
print('Async summary:', summary[0]['summary_text'])
text = "Hugging Face provides easy-to-use APIs for state-of-the-art NLP models."
asyncio.run(async_summarize(text)) output
Async summary: Hugging Face provides easy-to-use APIs for state-of-the-art NLP models.
Troubleshooting
- If you see
Model not found, verify the model name and internet connection. - For
CUDA out of memoryerrors, reduce batch size or run on CPU by settingdevice=-1in the pipeline. - If summaries are too short or too long, adjust
min_lengthandmax_lengthparameters.
Key Takeaways
- Use Hugging Face's
pipelinewith tasksummarizationfor easy text summarization. - Choose pretrained models like
facebook/bart-large-cnnfor high-quality summaries. - Adjust
max_lengthandmin_lengthto control summary size. - Handle errors by checking model names, internet connection, and device memory.
- Async summarization can be done using Python's
asynciowithpipeline.