How to beginner · 3 min read

How to compute sentence embeddings with Hugging Face

Q: How to compute sentence embeddings with Hugging Face

Use Hugging Face's transformers library with a pretrained model like sentence-transformers/all-MiniLM-L6-v2 to compute sentence embeddings. Load the tokenizer and model, tokenize your sentences, and extract embeddings from the model's output.

Quick answer

Use Hugging Face's transformers library with a pretrained model like sentence-transformers/all-MiniLM-L6-v2 to compute sentence embeddings. Load the tokenizer and model, tokenize your sentences, and extract embeddings from the model's output.

PREREQUISITES

Python 3.8+
pip install transformers sentence-transformers torch
Basic familiarity with Python

Setup

Install the required libraries using pip. You need transformers for model loading, sentence-transformers for easy embedding extraction, and torch as the backend.

bash

pip install transformers sentence-transformers torch

Step by step

This example shows how to compute sentence embeddings using the SentenceTransformer class from the sentence-transformers library, which wraps Hugging Face models optimized for sentence embeddings.

python

from sentence_transformers import SentenceTransformer

# Load a pretrained sentence transformer model
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

# List of sentences to embed
sentences = [
    "Hugging Face makes NLP easy.",
    "Sentence embeddings capture semantic meaning."
]

# Compute embeddings
embeddings = model.encode(sentences)

# Print the shape and first embedding vector
print(f"Embeddings shape: {embeddings.shape}")
print(f"First embedding vector:\n{embeddings[0]}")

output

Embeddings shape: (2, 384)
First embedding vector:
[ 0.01234567 -0.02345678  0.03456789 ... 0.04567890 -0.05678901  0.06789012]

Common variations

Use transformers directly by loading a model and tokenizer, then mean-pooling the last hidden states for embeddings.
Switch to other models like all-mpnet-base-v2 for higher accuracy.
Use GPU by moving the model to CUDA with model.to('cuda') if available.

python

from transformers import AutoTokenizer, AutoModel
import torch

# Load tokenizer and model
model_name = 'sentence-transformers/all-MiniLM-L6-v2'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)

sentences = ["Hugging Face makes NLP easy.", "Sentence embeddings capture semantic meaning."]

# Tokenize
inputs = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')

# Forward pass
with torch.no_grad():
    outputs = model(**inputs)

# Mean pooling
embeddings = outputs.last_hidden_state.mean(dim=1)

print(f"Embeddings shape: {embeddings.shape}")
print(f"First embedding vector:\n{embeddings[0]}")

output

Embeddings shape: torch.Size([2, 384])
First embedding vector:
tensor([ 0.0123, -0.0235,  0.0346, ...,  0.0457, -0.0568,  0.0679])

Troubleshooting

If you get CUDA out of memory errors, reduce batch size or run on CPU by setting model.to('cpu').
If embeddings are all zeros or identical, ensure you are using the correct model and that inputs are tokenized properly.
Install sentence-transformers to simplify embedding extraction instead of manual pooling.

✅

Key Takeaways

Use the sentence-transformers library for easy and optimized sentence embeddings.
Pretrained models like all-MiniLM-L6-v2 provide 384-dimensional embeddings suitable for semantic tasks.
You can use Hugging Face transformers directly with mean pooling for custom embedding extraction.
Move models to GPU with model.to('cuda') for faster embedding computation if available.
Troubleshoot by checking tokenization, model loading, and device memory issues.

Verified 2026-04 · sentence-transformers/all-MiniLM-L6-v2

Verify ↗