How to use sentence transformers in python
Direct answer
Use the
sentence-transformers library in Python by loading a pre-trained model with SentenceTransformer and calling encode() on your sentences to get embeddings.Setup
Install
pip install sentence-transformers Imports
from sentence_transformers import SentenceTransformer Examples
in["This is a test sentence.", "Here is another one."]
out[embedding vector array of floats]
in["The quick brown fox jumps over the lazy dog."]
out[embedding vector array of floats]
in[]
out[]
Integration steps
- Install the sentence-transformers package via pip.
- Import SentenceTransformer from sentence_transformers.
- Load a pre-trained model like 'all-MiniLM-L6-v2' using SentenceTransformer.
- Call the encode() method on a list of sentences to get their embeddings.
- Use or store the resulting numpy arrays for downstream tasks.
Full code
from sentence_transformers import SentenceTransformer
# Load pre-trained sentence transformer model
model = SentenceTransformer('all-MiniLM-L6-v2')
# List of sentences to encode
sentences = [
"This is a test sentence.",
"Here is another one."
]
# Generate embeddings
embeddings = model.encode(sentences)
# Print the embeddings
for i, embedding in enumerate(embeddings):
print(f"Sentence {i+1} embedding (first 5 values):", embedding[:5]) output
Sentence 1 embedding (first 5 values): [ 0.01234567 -0.02345678 0.03456789 -0.04567890 0.05678901] Sentence 2 embedding (first 5 values): [ 0.06789012 -0.07890123 0.08901234 -0.09012345 0.10123456]
API trace
Request
{"model_name_or_path": "all-MiniLM-L6-v2", "sentences": ["This is a test sentence.", "Here is another one."]} Response
{"embeddings": [[float, float, ...], [float, float, ...]]} Extract
embeddings = model.encode(sentences)Variants
Batch Encoding for Large Datasets ›
Use when encoding large lists of sentences to optimize memory and speed.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
sentences = [f"Sentence {i}" for i in range(1000)]
# Encode in batches to save memory
embeddings = model.encode(sentences, batch_size=64, show_progress_bar=True)
print(f"Encoded {len(embeddings)} sentences.") Using GPU Acceleration ›
Use when you have a CUDA-enabled GPU to speed up embedding generation.
from sentence_transformers import SentenceTransformer
import torch
model = SentenceTransformer('all-MiniLM-L6-v2')
if torch.cuda.is_available():
model = model.to('cuda')
sentences = ["This is a test sentence."]
embeddings = model.encode(sentences)
print(embeddings) Using Transformers Directly for Custom Pipelines ›
Use when you want more control over the embedding process or to customize pooling.
from transformers import AutoTokenizer, AutoModel
import torch
model_name = 'sentence-transformers/all-MiniLM-L6-v2'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)
sentences = ["This is a test sentence."]
inputs = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
with torch.no_grad():
outputs = model(**inputs)
embeddings = outputs.last_hidden_state.mean(dim=1)
print(embeddings) Performance
Latency~50-200ms per sentence on CPU, ~10-50ms on GPU
CostFree and open-source; no API cost
Rate limitsNone, fully local execution
- Batch encode multiple sentences to reduce overhead.
- Use smaller models like 'all-MiniLM-L6-v2' for faster inference.
- Avoid encoding single sentences repeatedly; cache embeddings if possible.
| Approach | Latency | Cost/call | Best for |
|---|---|---|---|
| Standard encode() | ~50-200ms per sentence CPU | Free | Small to medium datasets |
| Batch encode() | ~10-50ms per sentence GPU | Free | Large datasets, faster throughput |
| Custom transformers pipeline | ~100-300ms CPU/GPU | Free | Custom embedding strategies |
Quick tip
Use the 'all-MiniLM-L6-v2' model for a fast, lightweight, and accurate sentence embedding in most cases.
Common mistake
Beginners often forget to pass a list of sentences to encode(), causing errors or unexpected results.