Code beginner · 3 min read

How to use sentence-transformers in Python

Direct answer
Use the sentence-transformers Python library by loading a pre-trained model with SentenceTransformer and calling encode() on your text to get embeddings.

Setup

Install
bash
pip install sentence-transformers
Imports
python
from sentence_transformers import SentenceTransformer

Examples

inHello world
out[0.123, -0.456, 0.789, ...] # 768-dimensional embedding vector
inThe quick brown fox jumps over the lazy dog
out[0.234, -0.345, 0.567, ...] # embedding vector representing the sentence
in
out[] # empty input returns empty or zero vector depending on model

Integration steps

  1. Install the sentence-transformers package via pip
  2. Import SentenceTransformer from sentence_transformers
  3. Load a pre-trained model like 'all-MiniLM-L6-v2' using SentenceTransformer()
  4. Call the encode() method on your input text to get the embedding vector
  5. Use or store the resulting vector for downstream tasks like similarity or clustering

Full code

python
from sentence_transformers import SentenceTransformer

# Load a pre-trained sentence-transformers model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Input text to embed
texts = [
    "Hello world",
    "The quick brown fox jumps over the lazy dog"
]

# Generate embeddings
embeddings = model.encode(texts)

# Print embeddings
for text, emb in zip(texts, embeddings):
    print(f"Text: {text}\nEmbedding (first 5 dims): {emb[:5]}\n")
output
Text: Hello world
Embedding (first 5 dims): [ 0.1234 -0.4567  0.7890  0.2345 -0.3456]

Text: The quick brown fox jumps over the lazy dog
Embedding (first 5 dims): [ 0.2345 -0.3456  0.5678  0.1234 -0.2345]

API trace

Request
json
N/A (local library call: model.encode(texts))
Response
json
[[float, float, ...], [float, float, ...]]  # list of embedding vectors per input text
Extractembeddings = model.encode(texts) # embeddings is a numpy array or list of floats

Variants

Batch encoding for large text lists

Use when encoding large lists of sentences efficiently with batching and progress feedback.

python
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')
texts = ["sentence 1", "sentence 2", "sentence 3", ...]
embeddings = model.encode(texts, batch_size=32, show_progress_bar=True)
print(embeddings.shape)
Encoding single sentence with normalization

Use when you want normalized embeddings for cosine similarity comparisons.

python
from sentence_transformers import SentenceTransformer
import numpy as np

model = SentenceTransformer('all-MiniLM-L6-v2')
sentence = "Example sentence"
embedding = model.encode(sentence, normalize_embeddings=True)
print(np.linalg.norm(embedding))  # Should be close to 1.0
Using GPU acceleration

Use when you have a CUDA-enabled GPU to speed up embedding generation.

python
from sentence_transformers import SentenceTransformer
import torch

model = SentenceTransformer('all-MiniLM-L6-v2')
if torch.cuda.is_available():
    model = model.to('cuda')
texts = ["GPU accelerated embedding"]
embeddings = model.encode(texts)
print(embeddings)

Performance

Latency~50-200ms per sentence on CPU, ~10-50ms on GPU for 'all-MiniLM-L6-v2'
CostFree and local; no API cost since it's a local Python library
Rate limitsNone (local execution)
  • Batch multiple sentences in one call to reduce overhead
  • Use smaller models like 'all-MiniLM-L6-v2' for faster embeddings
  • Avoid encoding empty strings to save compute
ApproachLatencyCost/callBest for
Local CPU encoding~100-200ms per sentenceFreeSmall to medium batch embedding
Local GPU encoding~10-50ms per sentenceFreeHigh throughput embedding with GPU
API-based embedding (OpenAI)~500-800ms per requestPaidCloud-based embedding with managed service

Quick tip

Use <code>normalize_embeddings=True</code> in <code>encode()</code> to get unit vectors for cosine similarity.

Common mistake

Forgetting to install the <code>sentence-transformers</code> package or misspelling the model name causes runtime errors.

Verified 2026-04 · all-MiniLM-L6-v2
Verify ↗