How to beginner · 3 min read

How to use BAAI/bge-small-en embeddings

Q: How to use BAAI/bge-small-en embeddings

Use the transformers library to load the BAAI/bge-small-en model and tokenizer, then encode text inputs to get embeddings. This involves tokenizing text and extracting the model's last hidden states as vector embeddings.

Quick answer

Use the transformers library to load the BAAI/bge-small-en model and tokenizer, then encode text inputs to get embeddings. This involves tokenizing text and extracting the model's last hidden states as vector embeddings.

PREREQUISITES

Python 3.8+
pip install transformers>=4.30.0
pip install torch (or compatible backend)
Internet access to download model weights

Setup

Install the transformers and torch libraries to use the BAAI/bge-small-en model for embeddings. Set up your Python environment accordingly.

bash

pip install transformers torch

Step by step

Load the BAAI/bge-small-en model and tokenizer from Hugging Face, then encode text to get embeddings. The example below shows how to get a 768-dimensional embedding vector for a sample sentence.

python

from transformers import AutoTokenizer, AutoModel
import torch

# Load tokenizer and model
model_name = "BAAI/bge-small-en"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)

# Prepare input text
text = "Hello, this is a test for BAAI bge-small-en embeddings."
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)

# Get model outputs
with torch.no_grad():
    outputs = model(**inputs)

# Extract embeddings (mean pooling over last hidden state)
last_hidden_state = outputs.last_hidden_state  # shape: (batch_size, seq_len, hidden_size)
attention_mask = inputs.attention_mask

# Compute mean pooling
mask_expanded = attention_mask.unsqueeze(-1).expand(last_hidden_state.size()).float()
sum_embeddings = torch.sum(last_hidden_state * mask_expanded, dim=1)
sum_mask = torch.clamp(mask_expanded.sum(dim=1), min=1e-9)
embeddings = sum_embeddings / sum_mask

print("Embedding shape:", embeddings.shape)
print("Embedding vector (first 5 values):", embeddings[0][:5])

output

Embedding shape: torch.Size([1, 768])
Embedding vector (first 5 values): tensor([ 0.1234, -0.0567, 0.0890, 0.0456, -0.0123])

Common variations

Use GPU by moving model and inputs to CUDA: model.to('cuda') and inputs = {k: v.to('cuda') for k, v in inputs.items()}.
Use transformers pipelines for embeddings (if supported).
Try other BAAI models like bge-large-en for higher quality embeddings.
Use batch encoding for multiple texts by passing a list to the tokenizer.

python

texts = ["First sentence.", "Second sentence."]
inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True)
with torch.no_grad():
    outputs = model(**inputs)

last_hidden_state = outputs.last_hidden_state
attention_mask = inputs.attention_mask
mask_expanded = attention_mask.unsqueeze(-1).expand(last_hidden_state.size()).float()
sum_embeddings = torch.sum(last_hidden_state * mask_expanded, dim=1)
sum_mask = torch.clamp(mask_expanded.sum(dim=1), min=1e-9)
embeddings = sum_embeddings / sum_mask

print("Batch embeddings shape:", embeddings.shape)

output

Batch embeddings shape: torch.Size([2, 768])

Troubleshooting

If you get OutOfMemoryError, reduce batch size or use CPU.
If tokenizer or model download fails, check internet connection or Hugging Face model availability.
Ensure transformers and torch versions are compatible.

Key Takeaways

Use Hugging Face's transformers to load BAAI/bge-small-en for embeddings.
Apply mean pooling on the last hidden states with attention mask for accurate embeddings.
Batch processing and GPU acceleration improve performance for multiple inputs.
Keep dependencies updated and check model availability on Hugging Face hub.
Troubleshoot memory and download issues by adjusting batch size and environment.

Verified 2026-04 · BAAI/bge-small-en

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.