bge vs e5 embeddings: which embedding model should you use for RAG?
Use bge if you need the best retrieval accuracy (NDCG@10: 0.64+) and don't mind larger models. Use e5 if you want a lightweight, well-documented model that balances accuracy and speed across multilingual datasets.
VERDICT
Side-by-side comparison
| Dimension | bge | e5 embeddings | Winner |
|---|---|---|---|
| Base model size (small variant) | 110M parameters | 109M parameters | Tie |
| Base model size (large variant) | 335M parameters | 468M parameters | bge |
| NDCG@10 (BEIR benchmark) | 0.642 (bge-base-en-v1.5) | 0.612 (e5-base-v2) | bge |
| Multilingual support | Separate multilingual models required | Built-in multilingual (e5-multilingual-large) | e5 embeddings |
| Max sequence length | 512 tokens | 512 tokens | Tie |
| Output dimension | 768 (base), 1024 (large) | 768 (base), 1024 (large) | Tie |
| Training data | ~430M text pairs (domain-specific) | ~1B diverse examples (domain-diverse) | e5 embeddings |
| Licensing | MIT | MIT | Tie |
| Inference speed (CPU, 768-dim) | ~80ms per text (base) | ~85ms per text (base) | bge |
| Hugging Face downloads (monthly) | ~2M (bge-base-en-v1.5) | ~800K (e5-base-v2) | bge |
Performance benchmarks
NDCG@10 on BEIR (15 heterogeneous retrieval tasks, average)
bge wins by ~3% on the standard academic benchmark. BEIR includes domains like news, scientific, legal, medical.
NDCG@10 on out-of-domain queries (zero-shot generalization)
e5 generalizes better to unseen domains (+1.6%) due to broader training data diversity; bge optimized for in-domain performance.
Inference latency (A100 GPU batch, 32 texts)
bge is ~15% faster on GPU due to smaller base architecture. Both use Sentence-Transformers optimizations.
Model size on disk (full precision, base variant)
Effectively identical at base size. Quantized versions available for both reduce to ~110MB (int8).
Multilingual NDCG@10 (mBEIR, 18 languages)
e5-multilingual outperforms bge-m3 by ~5% on multilingual retrieval. bge-m3 is larger and slower but strong for dense + sparse hybrid search.
When to use each
- ✓ Building a production RAG system where top-1 retrieval accuracy is critical: bge consistently achieves 2-4% higher NDCG@10 on in-domain benchmark tasks like BEIR.
- ✓ Your corpus is domain-specific (legal, medical, scientific papers) and you want optimal ranking for known distributions: bge's training focused on high-quality annotated pairs.
- ✓ You need fast inference on a single GPU and model size matters: bge-base is slightly smaller (110M vs 109M) with 15% better throughput than e5-base.
- ✓ You're using a vector database (Pinecone, Weaviate) where storage costs scale with corpus size: bge's smaller large variant (335M) vs e5's (468M) saves ~25% disk/memory for the same dimension.
- ✓ You want strong hybrid search (dense + sparse BM25): bge-m3 explicitly trained for multi-vector retrieval when you need it.
- ✓ Your queries and documents come from multiple languages and you need single-model solution: e5-multilingual handles 18+ languages without switching models, while bge requires separate multilingual variant.
- ✓ You're building a system that encounters out-of-domain queries at test time: e5's broader training data (1B examples vs 430M) generalizes better to zero-shot scenarios, gaining ~1.6% NDCG@10.
- ✓ You want battle-tested, well-documented code with a large community: e5 has more GitHub stars (4K+), more tutorials, and heavier adoption in open-source RAG frameworks (LangChain, LlamaIndex examples).
- ✓ Inference speed is secondary and you want a more flexible base: e5 is more commonly fine-tuned in academic research, with more published papers using e5 as a baseline.
- ✓ You need embedding models that train well with contrastive learning on custom data: e5's training procedure is simpler to reproduce and fine-tune vs bge's domain-specific pair selection.
Common misconceptions
bge
bge models are only good for English because of 'en-v1.5' in the name
bge-en-v1.5 is English-only, but bge-m3 is a separate 335M multilingual model trained on 18+ languages. You must explicitly choose bge-m3 for non-English; bge doesn't auto-detect language like some e5 variants.
bge is a single model you can just download and use
bge is a family: bge-base-en-v1.5 (110M), bge-large-en-v1.5 (335M), bge-small-en-v1.5 (33M), bge-m3 (multilingual, 335M). Each has different NDCG scores: you must pick the variant that matches your accuracy needs and latency budget.
Higher NDCG@10 on BEIR means bge wins on all datasets
bge's +3% comes from in-domain datasets. On zero-shot out-of-domain tasks, e5 actually wins by ~1.6% due to training on more diverse sources. If your corpus isn't in BEIR (legal contracts, internal docs), e5 may retrieve better.
e5 embeddings
e5-base and e5-multilingual-base are the same model with different training
e5-multilingual-base exists, but e5-multilingual-large (468M) is the recommended multilingual variant. Using e5-base for multilingual queries gives you the single-language model, which hurts non-English recall by ~8%.
e5 embeddings are slower than bge because the model is newer
e5-base is only ~15% slower than bge-base on GPU (14ms vs 12ms), and both use identical Sentence-Transformers inference. CPU latency is nearly identical (~85ms each). The difference is negligible for most production systems.
e5's broader training data means it works great for any domain without fine-tuning
e5 generalizes better zero-shot, but it's not a universal model: fine-tuning on domain-specific pairs still improves NDCG by 3-5% for specialized domains like legal or medical, just like bge requires.
Code examples
Task: Embed a list of documents and compute cosine similarity for retrieval ranking.
from sentence_transformers import SentenceTransformer
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
# Initialize bge model: explicitly specify the English variant
model = SentenceTransformer('BAAI/bge-base-en-v1.5')
# Embed documents
docs = [
"Python is a programming language",
"Machine learning requires large datasets",
"Neural networks are inspired by biology"
]
doc_embeddings = model.encode(docs, normalize_embeddings=True)
# Embed query
query = "How do neural networks work?"
query_embedding = model.encode([query], normalize_embeddings=True)
# Rank documents by similarity (bge uses cosine similarity on normalized vectors)
scores = cosine_similarity(query_embedding, doc_embeddings)[0]
ranked_indices = np.argsort(-scores)
print(f"Top result: {docs[ranked_indices[0]]} (score: {scores[ranked_indices[0]]:.3f})") bge's normalize_embeddings=True is crucial: bge-base-en-v1.5 expects L2 normalization for cosine similarity scoring, which is why it outperforms on NDCG@10 benchmarks.
from sentence_transformers import SentenceTransformer
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
# Initialize e5 model: e5-base-v2 for English (e5-multilingual-large for multi-language)
model = SentenceTransformer('intfloat/e5-base-v2')
# Embed documents with 'passage: ' prefix (e5 requires task-specific prefixes)
docs = [
"Python is a programming language",
"Machine learning requires large datasets",
"Neural networks are inspired by biology"
]
doc_embeddings = model.encode(
[f"passage: {doc}" for doc in docs],
normalize_embeddings=True
)
# Embed query with 'query: ' prefix (e5 distinguishes query vs passage)
query = "How do neural networks work?"
query_embedding = model.encode(
[f"query: {query}"],
normalize_embeddings=True
)
# Rank documents by similarity
scores = cosine_similarity(query_embedding, doc_embeddings)[0]
ranked_indices = np.argsort(-scores)
print(f"Top result: {docs[ranked_indices[0]]} (score: {scores[ranked_indices[0]]:.3f})") e5 requires explicit 'passage: ' and 'query: ' prefixes during encoding: this is a key API difference. Without prefixes, e5 retrieval degrades by ~5% NDCG. bge handles this automatically.
Migration path
- Switching from bge to e5 embeddings (or vice versa):
- Install: both use sentence-transformers, so `pip install sentence-transformers` is identical.
- Model ID change: replace `'BAAI/bge-base-en-v1.5'` with `'intfloat/e5-base-v2'`.
- Critical API difference: e5 REQUIRES prefixes ('passage: ' and 'query: ') before encoding. Add a utility function: `def encode_passages(texts): return model.encode([f'passage: {t}' for t in texts])` and `def encode_query(q): return model.encode([f'query: {q}'])`. bge does not need prefixes.
- If using Pinecone/Weaviate: re-embed entire corpus with new model and re-index (embeddings are incompatible).
- Re-run NDCG evaluation: expect bge to be 2-4% higher on in-domain tasks, e5 to be 1-2% higher on zero-shot.
- For multilingual: switch from bge-base-en-v1.5 to e5-multilingual-large (468M, larger), or keep bge and add separate bge-m3 model.
RECOMMENDATION