How to compute sentence embeddings with Hugging Face
Quick answer
Use Hugging Face's
transformers library with a pretrained model like sentence-transformers/all-MiniLM-L6-v2 to compute sentence embeddings. Load the tokenizer and model, tokenize your sentences, and extract embeddings from the model's output.PREREQUISITES
Python 3.8+pip install transformers sentence-transformers torchBasic familiarity with Python
Setup
Install the required libraries using pip. You need transformers for model loading, sentence-transformers for easy embedding extraction, and torch as the backend.
pip install transformers sentence-transformers torch Step by step
This example shows how to compute sentence embeddings using the SentenceTransformer class from the sentence-transformers library, which wraps Hugging Face models optimized for sentence embeddings.
from sentence_transformers import SentenceTransformer
# Load a pretrained sentence transformer model
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
# List of sentences to embed
sentences = [
"Hugging Face makes NLP easy.",
"Sentence embeddings capture semantic meaning."
]
# Compute embeddings
embeddings = model.encode(sentences)
# Print the shape and first embedding vector
print(f"Embeddings shape: {embeddings.shape}")
print(f"First embedding vector:\n{embeddings[0]}") output
Embeddings shape: (2, 384) First embedding vector: [ 0.01234567 -0.02345678 0.03456789 ... 0.04567890 -0.05678901 0.06789012]
Common variations
- Use
transformersdirectly by loading a model and tokenizer, then mean-pooling the last hidden states for embeddings. - Switch to other models like
all-mpnet-base-v2for higher accuracy. - Use GPU by moving the model to CUDA with
model.to('cuda')if available.
from transformers import AutoTokenizer, AutoModel
import torch
# Load tokenizer and model
model_name = 'sentence-transformers/all-MiniLM-L6-v2'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)
sentences = ["Hugging Face makes NLP easy.", "Sentence embeddings capture semantic meaning."]
# Tokenize
inputs = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
# Forward pass
with torch.no_grad():
outputs = model(**inputs)
# Mean pooling
embeddings = outputs.last_hidden_state.mean(dim=1)
print(f"Embeddings shape: {embeddings.shape}")
print(f"First embedding vector:\n{embeddings[0]}") output
Embeddings shape: torch.Size([2, 384]) First embedding vector: tensor([ 0.0123, -0.0235, 0.0346, ..., 0.0457, -0.0568, 0.0679])
Troubleshooting
- If you get CUDA out of memory errors, reduce batch size or run on CPU by setting
model.to('cpu'). - If embeddings are all zeros or identical, ensure you are using the correct model and that inputs are tokenized properly.
- Install
sentence-transformersto simplify embedding extraction instead of manual pooling.
Key Takeaways
- Use the
sentence-transformerslibrary for easy and optimized sentence embeddings. - Pretrained models like
all-MiniLM-L6-v2provide 384-dimensional embeddings suitable for semantic tasks. - You can use Hugging Face
transformersdirectly with mean pooling for custom embedding extraction. - Move models to GPU with
model.to('cuda')for faster embedding computation if available. - Troubleshoot by checking tokenization, model loading, and device memory issues.