How to use sentence-transformers in Python
Direct answer
Use the
sentence-transformers Python library by loading a pre-trained model with SentenceTransformer and calling encode() on your text to get embeddings.Setup
Install
pip install sentence-transformers Imports
from sentence_transformers import SentenceTransformer Examples
inHello world
out[0.123, -0.456, 0.789, ...] # 768-dimensional embedding vector
inThe quick brown fox jumps over the lazy dog
out[0.234, -0.345, 0.567, ...] # embedding vector representing the sentence
in
out[] # empty input returns empty or zero vector depending on model
Integration steps
- Install the sentence-transformers package via pip
- Import SentenceTransformer from sentence_transformers
- Load a pre-trained model like 'all-MiniLM-L6-v2' using SentenceTransformer()
- Call the encode() method on your input text to get the embedding vector
- Use or store the resulting vector for downstream tasks like similarity or clustering
Full code
from sentence_transformers import SentenceTransformer
# Load a pre-trained sentence-transformers model
model = SentenceTransformer('all-MiniLM-L6-v2')
# Input text to embed
texts = [
"Hello world",
"The quick brown fox jumps over the lazy dog"
]
# Generate embeddings
embeddings = model.encode(texts)
# Print embeddings
for text, emb in zip(texts, embeddings):
print(f"Text: {text}\nEmbedding (first 5 dims): {emb[:5]}\n") output
Text: Hello world Embedding (first 5 dims): [ 0.1234 -0.4567 0.7890 0.2345 -0.3456] Text: The quick brown fox jumps over the lazy dog Embedding (first 5 dims): [ 0.2345 -0.3456 0.5678 0.1234 -0.2345]
API trace
Request
N/A (local library call: model.encode(texts)) Response
[[float, float, ...], [float, float, ...]] # list of embedding vectors per input text Extract
embeddings = model.encode(texts) # embeddings is a numpy array or list of floatsVariants
Batch encoding for large text lists ›
Use when encoding large lists of sentences efficiently with batching and progress feedback.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
texts = ["sentence 1", "sentence 2", "sentence 3", ...]
embeddings = model.encode(texts, batch_size=32, show_progress_bar=True)
print(embeddings.shape) Encoding single sentence with normalization ›
Use when you want normalized embeddings for cosine similarity comparisons.
from sentence_transformers import SentenceTransformer
import numpy as np
model = SentenceTransformer('all-MiniLM-L6-v2')
sentence = "Example sentence"
embedding = model.encode(sentence, normalize_embeddings=True)
print(np.linalg.norm(embedding)) # Should be close to 1.0 Using GPU acceleration ›
Use when you have a CUDA-enabled GPU to speed up embedding generation.
from sentence_transformers import SentenceTransformer
import torch
model = SentenceTransformer('all-MiniLM-L6-v2')
if torch.cuda.is_available():
model = model.to('cuda')
texts = ["GPU accelerated embedding"]
embeddings = model.encode(texts)
print(embeddings) Performance
Latency~50-200ms per sentence on CPU, ~10-50ms on GPU for 'all-MiniLM-L6-v2'
CostFree and local; no API cost since it's a local Python library
Rate limitsNone (local execution)
- Batch multiple sentences in one call to reduce overhead
- Use smaller models like 'all-MiniLM-L6-v2' for faster embeddings
- Avoid encoding empty strings to save compute
| Approach | Latency | Cost/call | Best for |
|---|---|---|---|
| Local CPU encoding | ~100-200ms per sentence | Free | Small to medium batch embedding |
| Local GPU encoding | ~10-50ms per sentence | Free | High throughput embedding with GPU |
| API-based embedding (OpenAI) | ~500-800ms per request | Paid | Cloud-based embedding with managed service |
Quick tip
Use <code>normalize_embeddings=True</code> in <code>encode()</code> to get unit vectors for cosine similarity.
Common mistake
Forgetting to install the <code>sentence-transformers</code> package or misspelling the model name causes runtime errors.