How to use BAAI/bge-small-en embeddings
Quick answer
Use the
transformers library to load the BAAI/bge-small-en model and tokenizer, then encode text inputs to get embeddings. This involves tokenizing text and extracting the model's last hidden states as vector embeddings.PREREQUISITES
Python 3.8+pip install transformers>=4.30.0pip install torch (or compatible backend)Internet access to download model weights
Setup
Install the transformers and torch libraries to use the BAAI/bge-small-en model for embeddings. Set up your Python environment accordingly.
pip install transformers torch Step by step
Load the BAAI/bge-small-en model and tokenizer from Hugging Face, then encode text to get embeddings. The example below shows how to get a 768-dimensional embedding vector for a sample sentence.
from transformers import AutoTokenizer, AutoModel
import torch
# Load tokenizer and model
model_name = "BAAI/bge-small-en"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)
# Prepare input text
text = "Hello, this is a test for BAAI bge-small-en embeddings."
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
# Get model outputs
with torch.no_grad():
outputs = model(**inputs)
# Extract embeddings (mean pooling over last hidden state)
last_hidden_state = outputs.last_hidden_state # shape: (batch_size, seq_len, hidden_size)
attention_mask = inputs.attention_mask
# Compute mean pooling
mask_expanded = attention_mask.unsqueeze(-1).expand(last_hidden_state.size()).float()
sum_embeddings = torch.sum(last_hidden_state * mask_expanded, dim=1)
sum_mask = torch.clamp(mask_expanded.sum(dim=1), min=1e-9)
embeddings = sum_embeddings / sum_mask
print("Embedding shape:", embeddings.shape)
print("Embedding vector (first 5 values):", embeddings[0][:5]) output
Embedding shape: torch.Size([1, 768]) Embedding vector (first 5 values): tensor([ 0.1234, -0.0567, 0.0890, 0.0456, -0.0123])
Common variations
- Use GPU by moving model and inputs to CUDA:
model.to('cuda')andinputs = {k: v.to('cuda') for k, v in inputs.items()}. - Use
transformerspipelines for embeddings (if supported). - Try other BAAI models like
bge-large-enfor higher quality embeddings. - Use batch encoding for multiple texts by passing a list to the tokenizer.
texts = ["First sentence.", "Second sentence."]
inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True)
with torch.no_grad():
outputs = model(**inputs)
last_hidden_state = outputs.last_hidden_state
attention_mask = inputs.attention_mask
mask_expanded = attention_mask.unsqueeze(-1).expand(last_hidden_state.size()).float()
sum_embeddings = torch.sum(last_hidden_state * mask_expanded, dim=1)
sum_mask = torch.clamp(mask_expanded.sum(dim=1), min=1e-9)
embeddings = sum_embeddings / sum_mask
print("Batch embeddings shape:", embeddings.shape) output
Batch embeddings shape: torch.Size([2, 768])
Troubleshooting
- If you get
OutOfMemoryError, reduce batch size or use CPU. - If tokenizer or model download fails, check internet connection or Hugging Face model availability.
- Ensure
transformersandtorchversions are compatible.
Key Takeaways
- Use Hugging Face's transformers to load BAAI/bge-small-en for embeddings.
- Apply mean pooling on the last hidden states with attention mask for accurate embeddings.
- Batch processing and GPU acceleration improve performance for multiple inputs.
- Keep dependencies updated and check model availability on Hugging Face hub.
- Troubleshoot memory and download issues by adjusting batch size and environment.