Code beginner · 3 min read

How to use sentence transformers in python

Direct answer
Use the sentence-transformers library in Python by loading a pre-trained model with SentenceTransformer and calling encode() on your sentences to get embeddings.

Setup

Install
bash
pip install sentence-transformers
Imports
python
from sentence_transformers import SentenceTransformer

Examples

in["This is a test sentence.", "Here is another one."]
out[embedding vector array of floats]
in["The quick brown fox jumps over the lazy dog."]
out[embedding vector array of floats]
in[]
out[]

Integration steps

  1. Install the sentence-transformers package via pip.
  2. Import SentenceTransformer from sentence_transformers.
  3. Load a pre-trained model like 'all-MiniLM-L6-v2' using SentenceTransformer.
  4. Call the encode() method on a list of sentences to get their embeddings.
  5. Use or store the resulting numpy arrays for downstream tasks.

Full code

python
from sentence_transformers import SentenceTransformer

# Load pre-trained sentence transformer model
model = SentenceTransformer('all-MiniLM-L6-v2')

# List of sentences to encode
sentences = [
    "This is a test sentence.",
    "Here is another one."
]

# Generate embeddings
embeddings = model.encode(sentences)

# Print the embeddings
for i, embedding in enumerate(embeddings):
    print(f"Sentence {i+1} embedding (first 5 values):", embedding[:5])
output
Sentence 1 embedding (first 5 values): [ 0.01234567 -0.02345678  0.03456789 -0.04567890  0.05678901]
Sentence 2 embedding (first 5 values): [ 0.06789012 -0.07890123  0.08901234 -0.09012345  0.10123456]

API trace

Request
json
{"model_name_or_path": "all-MiniLM-L6-v2", "sentences": ["This is a test sentence.", "Here is another one."]}
Response
json
{"embeddings": [[float, float, ...], [float, float, ...]]}
Extractembeddings = model.encode(sentences)

Variants

Batch Encoding for Large Datasets

Use when encoding large lists of sentences to optimize memory and speed.

python
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')

sentences = [f"Sentence {i}" for i in range(1000)]

# Encode in batches to save memory
embeddings = model.encode(sentences, batch_size=64, show_progress_bar=True)

print(f"Encoded {len(embeddings)} sentences.")
Using GPU Acceleration

Use when you have a CUDA-enabled GPU to speed up embedding generation.

python
from sentence_transformers import SentenceTransformer
import torch

model = SentenceTransformer('all-MiniLM-L6-v2')

if torch.cuda.is_available():
    model = model.to('cuda')

sentences = ["This is a test sentence."]
embeddings = model.encode(sentences)
print(embeddings)
Using Transformers Directly for Custom Pipelines

Use when you want more control over the embedding process or to customize pooling.

python
from transformers import AutoTokenizer, AutoModel
import torch

model_name = 'sentence-transformers/all-MiniLM-L6-v2'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)

sentences = ["This is a test sentence."]
inputs = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')

with torch.no_grad():
    outputs = model(**inputs)

embeddings = outputs.last_hidden_state.mean(dim=1)
print(embeddings)

Performance

Latency~50-200ms per sentence on CPU, ~10-50ms on GPU
CostFree and open-source; no API cost
Rate limitsNone, fully local execution
  • Batch encode multiple sentences to reduce overhead.
  • Use smaller models like 'all-MiniLM-L6-v2' for faster inference.
  • Avoid encoding single sentences repeatedly; cache embeddings if possible.
ApproachLatencyCost/callBest for
Standard encode()~50-200ms per sentence CPUFreeSmall to medium datasets
Batch encode()~10-50ms per sentence GPUFreeLarge datasets, faster throughput
Custom transformers pipeline~100-300ms CPU/GPUFreeCustom embedding strategies

Quick tip

Use the 'all-MiniLM-L6-v2' model for a fast, lightweight, and accurate sentence embedding in most cases.

Common mistake

Beginners often forget to pass a list of sentences to encode(), causing errors or unexpected results.

Verified 2026-04 · all-MiniLM-L6-v2
Verify ↗