How to beginner · 3 min read

How to classify text with Hugging Face

Quick answer
Use the Hugging Face transformers library with a pretrained text classification model like distilbert-base-uncased-finetuned-sst-2-english. Load the model and tokenizer, then run your text through the pipeline to get classification labels and scores.

PREREQUISITES

  • Python 3.8+
  • pip install transformers>=4.30.0
  • pip install torch (or tensorflow)
  • Internet connection for model download

Setup

Install the Hugging Face transformers library and a backend like torch or tensorflow. The transformers package provides pretrained models and pipelines for text classification.

bash
pip install transformers torch
output
Collecting transformers
Collecting torch
Successfully installed torch-2.0.1 transformers-4.32.1

Step by step

Use the pipeline API for text classification. This example uses the distilbert-base-uncased-finetuned-sst-2-english model for sentiment classification.

python
from transformers import pipeline

# Initialize text classification pipeline
classifier = pipeline("text-classification", model="distilbert-base-uncased-finetuned-sst-2-english")

# Input text
text = "I love using Hugging Face Transformers!"

# Run classification
result = classifier(text)
print(result)
output
[{'label': 'POSITIVE', 'score': 0.9998}]

Common variations

  • Use other pretrained models like bert-base-uncased-finetuned-mrpc for paraphrase detection.
  • Run classification on a batch of texts by passing a list to the pipeline.
  • Use async pipelines with transformers>=4.32.0 and asyncio for concurrency.
python
import asyncio
from transformers import pipeline

async def async_classify(texts):
    classifier = pipeline("text-classification", model="distilbert-base-uncased-finetuned-sst-2-english")
    results = await classifier(texts, return_all_scores=True)
    return results

texts = ["I love this!", "This is bad."]

async def main():
    results = await async_classify(texts)
    print(results)

asyncio.run(main())
output
[[{'label': 'NEGATIVE', 'score': 0.001}, {'label': 'POSITIVE', 'score': 0.999}], [{'label': 'NEGATIVE', 'score': 0.998}, {'label': 'POSITIVE', 'score': 0.002}]]

Troubleshooting

  • If you get OSError: model not found, ensure you have internet access or download the model manually.
  • For CUDA errors, verify your PyTorch installation matches your GPU setup.
  • If classification results seem incorrect, try a different pretrained model suited for your task.

Key Takeaways

  • Use Hugging Face pipeline for quick text classification with pretrained models.
  • Pass single or batch texts to the pipeline for flexible classification.
  • Async pipelines enable concurrent classification for improved throughput.
  • Choose a model pretrained on your specific classification task for best accuracy.
Verified 2026-04 · distilbert-base-uncased-finetuned-sst-2-english, bert-base-uncased-finetuned-mrpc
Verify ↗