How to beginner · 3 min read

How to run embeddings locally

Q: How to run embeddings locally

To run embeddings locally, use open-source libraries like sentence-transformers which provide pre-trained models for generating vector embeddings without internet access. Install the library via pip and load a model such as all-MiniLM-L6-v2 to encode text into embeddings on your machine.

Quick answer

To run embeddings locally, use open-source libraries like sentence-transformers which provide pre-trained models for generating vector embeddings without internet access. Install the library via pip and load a model such as all-MiniLM-L6-v2 to encode text into embeddings on your machine.

PREREQUISITES

Python 3.8+
pip install sentence-transformers
Basic Python programming knowledge

Setup

Install the sentence-transformers library, which provides easy access to many pre-trained embedding models that run locally without API calls.

Run the following command in your terminal:

bash

pip install sentence-transformers

Step by step

Here is a complete Python example to generate embeddings locally using sentence-transformers. It loads the all-MiniLM-L6-v2 model and encodes a list of texts into vectors.

python

from sentence_transformers import SentenceTransformer

# Load the pre-trained embedding model locally
model = SentenceTransformer('all-MiniLM-L6-v2')

# Sample texts to embed
texts = [
    'OpenAI develops advanced AI models.',
    'Running embeddings locally improves privacy.',
    'Sentence-transformers is an open-source library.'
]

# Generate embeddings
embeddings = model.encode(texts)

# Print the shape and first embedding vector
print(f'Generated {len(embeddings)} embeddings with dimension {len(embeddings[0])}')
print('First embedding vector:', embeddings[0])

output

Generated 3 embeddings with dimension 384
First embedding vector: [ 0.01234567 -0.02345678 ... 0.03456789]

Common variations

You can run embeddings asynchronously using libraries like asyncio combined with sentence-transformers if you batch large inputs. Other models such as all-mpnet-base-v2 offer higher accuracy but larger size. For GPU acceleration, install torch with CUDA support and move the model to GPU.

python

import torch
from sentence_transformers import SentenceTransformer

# Load model on GPU if available
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = SentenceTransformer('all-MiniLM-L6-v2', device=device)

texts = ['Example text for GPU embedding']
embeddings = model.encode(texts)
print('Embedding on device:', device)
print(embeddings[0])

output

Embedding on device: cuda
[ 0.01234567 -0.02345678 ... 0.03456789]

Troubleshooting

If you get ImportError, ensure sentence-transformers is installed correctly.
If embeddings are slow, check if you can enable GPU acceleration by installing torch with CUDA.
For memory errors, try smaller batch sizes or use a lighter model like all-MiniLM-L6-v2.

✅

Key Takeaways

Use the open-source sentence-transformers library to run embeddings locally without API calls.
Choose models like all-MiniLM-L6-v2 for a good balance of speed and accuracy.
Enable GPU support by installing torch with CUDA and loading the model on the GPU device.
Local embeddings improve privacy and reduce dependency on external services.
Troubleshoot by verifying installations and adjusting batch sizes or model choice.

Verified 2026-04 · all-MiniLM-L6-v2, all-mpnet-base-v2

Verify ↗