Comparison intermediate · 7 min read

bge vs e5 embeddings: which embedding model should you use for RAG?

Quick pick

Use bge if you need the best retrieval accuracy (NDCG@10: 0.64+) and don't mind larger models. Use e5 if you want a lightweight, well-documented model that balances accuracy and speed across multilingual datasets.

VERDICT

Use bge for maximum retrieval quality: it consistently outperforms e5 by 2-4% on NDCG@10 across benchmarks and is the default choice for production RAG systems requiring state-of-the-art accuracy. Use e5 if you need a smaller, faster model that handles multilingual queries better and trains on more diverse data, making it more generalizable to out-of-domain queries. bge wins on raw retrieval performance; e5 wins on flexibility and inference speed.

Side-by-side comparison

Dimension	bge	e5 embeddings	Winner
Base model size (small variant)	110M parameters	109M parameters	Tie
Base model size (large variant)	335M parameters	468M parameters	bge
NDCG@10 (BEIR benchmark)	0.642 (bge-base-en-v1.5)	0.612 (e5-base-v2)	bge
Multilingual support	Separate multilingual models required	Built-in multilingual (e5-multilingual-large)	e5 embeddings
Max sequence length	512 tokens	512 tokens	Tie
Output dimension	768 (base), 1024 (large)	768 (base), 1024 (large)	Tie
Training data	~430M text pairs (domain-specific)	~1B diverse examples (domain-diverse)	e5 embeddings
Licensing	MIT	MIT	Tie
Inference speed (CPU, 768-dim)	~80ms per text (base)	~85ms per text (base)	bge
Hugging Face downloads (monthly)	~2M (bge-base-en-v1.5)	~800K (e5-base-v2)	bge

Performance benchmarks

NDCG@10 on BEIR (15 heterogeneous retrieval tasks, average)

bge 0.642 (bge-base-en-v1.5)

e5 embeddings 0.612 (e5-base-v2)

bge wins by ~3% on the standard academic benchmark. BEIR includes domains like news, scientific, legal, medical.

NDCG@10 on out-of-domain queries (zero-shot generalization)

bge 0.615 (bge-base-en-v1.5)

e5 embeddings 0.625 (e5-base-v2)

e5 generalizes better to unseen domains (+1.6%) due to broader training data diversity; bge optimized for in-domain performance.

Inference latency (A100 GPU batch, 32 texts)

bge ~12ms (bge-base-en-v1.5, 2,667 tok/s throughput)

e5 embeddings ~14ms (e5-base-v2, 2,286 tok/s throughput)

bge is ~15% faster on GPU due to smaller base architecture. Both use Sentence-Transformers optimizations.

Model size on disk (full precision, base variant)

bge 438 MB (bge-base-en-v1.5)

e5 embeddings 438 MB (e5-base-v2)

Effectively identical at base size. Quantized versions available for both reduce to ~110MB (int8).

Multilingual NDCG@10 (mBEIR, 18 languages)

bge 0.58 (requires separate bge-m3 model, 335M params)

e5 embeddings 0.61 (e5-multilingual-large, 468M params)

e5-multilingual outperforms bge-m3 by ~5% on multilingual retrieval. bge-m3 is larger and slower but strong for dense + sparse hybrid search.

When to use each

bge

✓ Building a production RAG system where top-1 retrieval accuracy is critical: bge consistently achieves 2-4% higher NDCG@10 on in-domain benchmark tasks like BEIR.
✓ Your corpus is domain-specific (legal, medical, scientific papers) and you want optimal ranking for known distributions: bge's training focused on high-quality annotated pairs.
✓ You need fast inference on a single GPU and model size matters: bge-base is slightly smaller (110M vs 109M) with 15% better throughput than e5-base.
✓ You're using a vector database (Pinecone, Weaviate) where storage costs scale with corpus size: bge's smaller large variant (335M) vs e5's (468M) saves ~25% disk/memory for the same dimension.
✓ You want strong hybrid search (dense + sparse BM25): bge-m3 explicitly trained for multi-vector retrieval when you need it.

e5 embeddings

✓ Your queries and documents come from multiple languages and you need single-model solution: e5-multilingual handles 18+ languages without switching models, while bge requires separate multilingual variant.
✓ You're building a system that encounters out-of-domain queries at test time: e5's broader training data (1B examples vs 430M) generalizes better to zero-shot scenarios, gaining ~1.6% NDCG@10.
✓ You want battle-tested, well-documented code with a large community: e5 has more GitHub stars (4K+), more tutorials, and heavier adoption in open-source RAG frameworks (LangChain, LlamaIndex examples).
✓ Inference speed is secondary and you want a more flexible base: e5 is more commonly fine-tuned in academic research, with more published papers using e5 as a baseline.
✓ You need embedding models that train well with contrastive learning on custom data: e5's training procedure is simpler to reproduce and fine-tune vs bge's domain-specific pair selection.

Common misconceptions

bge

✗ bge models are only good for English because of 'en-v1.5' in the name

✓ bge-en-v1.5 is English-only, but bge-m3 is a separate 335M multilingual model trained on 18+ languages. You must explicitly choose bge-m3 for non-English; bge doesn't auto-detect language like some e5 variants.

✗ bge is a single model you can just download and use

✓ bge is a family: bge-base-en-v1.5 (110M), bge-large-en-v1.5 (335M), bge-small-en-v1.5 (33M), bge-m3 (multilingual, 335M). Each has different NDCG scores: you must pick the variant that matches your accuracy needs and latency budget.

✗ Higher NDCG@10 on BEIR means bge wins on all datasets

✓ bge's +3% comes from in-domain datasets. On zero-shot out-of-domain tasks, e5 actually wins by ~1.6% due to training on more diverse sources. If your corpus isn't in BEIR (legal contracts, internal docs), e5 may retrieve better.

e5 embeddings

✗ e5-base and e5-multilingual-base are the same model with different training

✓ e5-multilingual-base exists, but e5-multilingual-large (468M) is the recommended multilingual variant. Using e5-base for multilingual queries gives you the single-language model, which hurts non-English recall by ~8%.

✗ e5 embeddings are slower than bge because the model is newer

✓ e5-base is only ~15% slower than bge-base on GPU (14ms vs 12ms), and both use identical Sentence-Transformers inference. CPU latency is nearly identical (~85ms each). The difference is negligible for most production systems.

✗ e5's broader training data means it works great for any domain without fine-tuning

✓ e5 generalizes better zero-shot, but it's not a universal model: fine-tuning on domain-specific pairs still improves NDCG by 3-5% for specialized domains like legal or medical, just like bge requires.

Code examples

Task: Embed a list of documents and compute cosine similarity for retrieval ranking.

bge: generate embeddings with bge-base-en-v1.5

python

from sentence_transformers import SentenceTransformer
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

# Initialize bge model: explicitly specify the English variant
model = SentenceTransformer('BAAI/bge-base-en-v1.5')

# Embed documents
docs = [
    "Python is a programming language",
    "Machine learning requires large datasets",
    "Neural networks are inspired by biology"
]
doc_embeddings = model.encode(docs, normalize_embeddings=True)

# Embed query
query = "How do neural networks work?"
query_embedding = model.encode([query], normalize_embeddings=True)

# Rank documents by similarity (bge uses cosine similarity on normalized vectors)
scores = cosine_similarity(query_embedding, doc_embeddings)[0]
ranked_indices = np.argsort(-scores)
print(f"Top result: {docs[ranked_indices[0]]} (score: {scores[ranked_indices[0]]:.3f})")

bge's normalize_embeddings=True is crucial: bge-base-en-v1.5 expects L2 normalization for cosine similarity scoring, which is why it outperforms on NDCG@10 benchmarks.

e5 embeddings: generate embeddings with e5-base-v2

python

from sentence_transformers import SentenceTransformer
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

# Initialize e5 model: e5-base-v2 for English (e5-multilingual-large for multi-language)
model = SentenceTransformer('intfloat/e5-base-v2')

# Embed documents with 'passage: ' prefix (e5 requires task-specific prefixes)
docs = [
    "Python is a programming language",
    "Machine learning requires large datasets",
    "Neural networks are inspired by biology"
]
doc_embeddings = model.encode(
    [f"passage: {doc}" for doc in docs],
    normalize_embeddings=True
)

# Embed query with 'query: ' prefix (e5 distinguishes query vs passage)
query = "How do neural networks work?"
query_embedding = model.encode(
    [f"query: {query}"],
    normalize_embeddings=True
)

# Rank documents by similarity
scores = cosine_similarity(query_embedding, doc_embeddings)[0]
ranked_indices = np.argsort(-scores)
print(f"Top result: {docs[ranked_indices[0]]} (score: {scores[ranked_indices[0]]:.3f})")

e5 requires explicit 'passage: ' and 'query: ' prefixes during encoding: this is a key API difference. Without prefixes, e5 retrieval degrades by ~5% NDCG. bge handles this automatically.

Migration path

Switching from bge to e5 embeddings (or vice versa):
Install: both use sentence-transformers, so `pip install sentence-transformers` is identical.
Model ID change: replace `'BAAI/bge-base-en-v1.5'` with `'intfloat/e5-base-v2'`.
Critical API difference: e5 REQUIRES prefixes ('passage: ' and 'query: ') before encoding. Add a utility function: `def encode_passages(texts): return model.encode([f'passage: {t}' for t in texts])` and `def encode_query(q): return model.encode([f'query: {q}'])`. bge does not need prefixes.
If using Pinecone/Weaviate: re-embed entire corpus with new model and re-index (embeddings are incompatible).
Re-run NDCG evaluation: expect bge to be 2-4% higher on in-domain tasks, e5 to be 1-2% higher on zero-shot.
For multilingual: switch from bge-base-en-v1.5 to e5-multilingual-large (468M, larger), or keep bge and add separate bge-m3 model.

RECOMMENDATION

Use bge for maximum NDCG@10 (0.642 vs 0.612 on BEIR) if your corpus domain matches academic benchmarks (web, scientific, news). Use e5 if you need multilingual support in a single model or your queries are out-of-domain: e5's broader training data wins by +1.6% on zero-shot generalization. For most production RAG systems, the 2-4% accuracy gap matters more than speed (both are <15ms on GPU), so bge is the safer default choice.

Verified 2026-04

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.