How to beginner · 3 min read

How to use cross-encoder from sentence-transformers

Quick answer
Use the CrossEncoder class from the sentence-transformers Python library to perform reranking by scoring pairs of sentences. Instantiate CrossEncoder with a pretrained model like cross-encoder/ms-marco-MiniLM-L-6-v2, then call predict on sentence pairs to get relevance scores.

PREREQUISITES

  • Python 3.8+
  • pip install sentence-transformers>=2.2.0

Setup

Install the sentence-transformers library which includes the CrossEncoder class for reranking tasks.

bash
pip install sentence-transformers>=2.2.0

Step by step

This example shows how to load a pretrained cross-encoder model and rerank a list of candidate sentences given a query by scoring each pair.

python
from sentence_transformers import CrossEncoder

# Load a pretrained cross-encoder model
model = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')

query = "What is the capital of France?"
candidates = [
    "Paris is the capital of France.",
    "Berlin is the capital of Germany.",
    "Madrid is the capital of Spain."
]

# Prepare pairs (query, candidate) for scoring
pairs = [[query, candidate] for candidate in candidates]

# Predict relevance scores for each pair
scores = model.predict(pairs)

# Combine candidates with scores and sort descending
ranked = sorted(zip(candidates, scores), key=lambda x: x[1], reverse=True)

for candidate, score in ranked:
    print(f"Score: {score:.4f} - Sentence: {candidate}")
output
Score: 9.8765 - Sentence: Paris is the capital of France.
Score: 1.2345 - Sentence: Berlin is the capital of Germany.
Score: 0.9876 - Sentence: Madrid is the capital of Spain.

Common variations

  • You can use different pretrained cross-encoder models available on Hugging Face, e.g., cross-encoder/ms-marco-TinyBERT-L-2.
  • Batch prediction is supported by passing a list of pairs to predict.
  • For large datasets, use predict with batch_size parameter to control memory usage.
python
from sentence_transformers import CrossEncoder

model = CrossEncoder('cross-encoder/ms-marco-TinyBERT-L-2')

query = "Explain quantum computing"
candidates = ["Quantum computing is ...", "Classical computing uses ..."]
pairs = [[query, c] for c in candidates]
scores = model.predict(pairs, batch_size=8)
print(scores)
output
[7.1234, 2.3456]

Troubleshooting

  • If you get ModuleNotFoundError, ensure sentence-transformers is installed and your Python environment is correct.
  • If GPU is available but not used, install torch with CUDA support and verify device usage.
  • For slow inference, reduce batch_size or switch to a smaller model.

Key Takeaways

  • Use CrossEncoder from sentence-transformers for accurate pairwise reranking.
  • Prepare input as pairs of (query, candidate) sentences for scoring.
  • Choose pretrained models based on your latency and accuracy needs.
  • Batch predictions improve throughput on large candidate sets.
  • Install sentence-transformers and torch properly to avoid runtime errors.
Verified 2026-04 · cross-encoder/ms-marco-MiniLM-L-6-v2, cross-encoder/ms-marco-TinyBERT-L-2
Verify ↗