How to run embeddings locally
Quick answer
To run embeddings locally, use open-source libraries like
sentence-transformers which provide pre-trained models for generating vector embeddings without internet access. Install the library via pip and load a model such as all-MiniLM-L6-v2 to encode text into embeddings on your machine.PREREQUISITES
Python 3.8+pip install sentence-transformersBasic Python programming knowledge
Setup
Install the sentence-transformers library, which provides easy access to many pre-trained embedding models that run locally without API calls.
Run the following command in your terminal:
pip install sentence-transformers Step by step
Here is a complete Python example to generate embeddings locally using sentence-transformers. It loads the all-MiniLM-L6-v2 model and encodes a list of texts into vectors.
from sentence_transformers import SentenceTransformer
# Load the pre-trained embedding model locally
model = SentenceTransformer('all-MiniLM-L6-v2')
# Sample texts to embed
texts = [
'OpenAI develops advanced AI models.',
'Running embeddings locally improves privacy.',
'Sentence-transformers is an open-source library.'
]
# Generate embeddings
embeddings = model.encode(texts)
# Print the shape and first embedding vector
print(f'Generated {len(embeddings)} embeddings with dimension {len(embeddings[0])}')
print('First embedding vector:', embeddings[0]) output
Generated 3 embeddings with dimension 384 First embedding vector: [ 0.01234567 -0.02345678 ... 0.03456789]
Common variations
You can run embeddings asynchronously using libraries like asyncio combined with sentence-transformers if you batch large inputs. Other models such as all-mpnet-base-v2 offer higher accuracy but larger size. For GPU acceleration, install torch with CUDA support and move the model to GPU.
import torch
from sentence_transformers import SentenceTransformer
# Load model on GPU if available
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = SentenceTransformer('all-MiniLM-L6-v2', device=device)
texts = ['Example text for GPU embedding']
embeddings = model.encode(texts)
print('Embedding on device:', device)
print(embeddings[0]) output
Embedding on device: cuda [ 0.01234567 -0.02345678 ... 0.03456789]
Troubleshooting
- If you get
ImportError, ensuresentence-transformersis installed correctly. - If embeddings are slow, check if you can enable GPU acceleration by installing
torchwith CUDA. - For memory errors, try smaller batch sizes or use a lighter model like
all-MiniLM-L6-v2.
Key Takeaways
- Use the open-source
sentence-transformerslibrary to run embeddings locally without API calls. - Choose models like
all-MiniLM-L6-v2for a good balance of speed and accuracy. - Enable GPU support by installing
torchwith CUDA and loading the model on the GPU device. - Local embeddings improve privacy and reduce dependency on external services.
- Troubleshoot by verifying installations and adjusting batch sizes or model choice.