How to beginner · 3 min read

How to use cross-encoder from sentence-transformers

Q: How to use cross-encoder from sentence-transformers

Use the CrossEncoder class from the sentence-transformers Python library to perform reranking by scoring pairs of sentences. Instantiate CrossEncoder with a pretrained model like cross-encoder/ms-marco-MiniLM-L-6-v2, then call predict on sentence pairs to get relevance scores.

Quick answer

Use the CrossEncoder class from the sentence-transformers Python library to perform reranking by scoring pairs of sentences. Instantiate CrossEncoder with a pretrained model like cross-encoder/ms-marco-MiniLM-L-6-v2, then call predict on sentence pairs to get relevance scores.

PREREQUISITES

Python 3.8+
pip install sentence-transformers>=2.2.0

Setup

Install the sentence-transformers library which includes the CrossEncoder class for reranking tasks.

bash

pip install sentence-transformers>=2.2.0

Step by step

This example shows how to load a pretrained cross-encoder model and rerank a list of candidate sentences given a query by scoring each pair.

python

from sentence_transformers import CrossEncoder

# Load a pretrained cross-encoder model
model = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')

query = "What is the capital of France?"
candidates = [
    "Paris is the capital of France.",
    "Berlin is the capital of Germany.",
    "Madrid is the capital of Spain."
]

# Prepare pairs (query, candidate) for scoring
pairs = [[query, candidate] for candidate in candidates]

# Predict relevance scores for each pair
scores = model.predict(pairs)

# Combine candidates with scores and sort descending
ranked = sorted(zip(candidates, scores), key=lambda x: x[1], reverse=True)

for candidate, score in ranked:
    print(f"Score: {score:.4f} - Sentence: {candidate}")

output

Score: 9.8765 - Sentence: Paris is the capital of France.
Score: 1.2345 - Sentence: Berlin is the capital of Germany.
Score: 0.9876 - Sentence: Madrid is the capital of Spain.

Common variations

You can use different pretrained cross-encoder models available on Hugging Face, e.g., cross-encoder/ms-marco-TinyBERT-L-2.
Batch prediction is supported by passing a list of pairs to predict.
For large datasets, use predict with batch_size parameter to control memory usage.

python

from sentence_transformers import CrossEncoder

model = CrossEncoder('cross-encoder/ms-marco-TinyBERT-L-2')

query = "Explain quantum computing"
candidates = ["Quantum computing is ...", "Classical computing uses ..."]
pairs = [[query, c] for c in candidates]
scores = model.predict(pairs, batch_size=8)
print(scores)

output

[7.1234, 2.3456]

Troubleshooting

If you get ModuleNotFoundError, ensure sentence-transformers is installed and your Python environment is correct.
If GPU is available but not used, install torch with CUDA support and verify device usage.
For slow inference, reduce batch_size or switch to a smaller model.

✅

Key Takeaways

Use CrossEncoder from sentence-transformers for accurate pairwise reranking.
Prepare input as pairs of (query, candidate) sentences for scoring.
Choose pretrained models based on your latency and accuracy needs.
Batch predictions improve throughput on large candidate sets.
Install sentence-transformers and torch properly to avoid runtime errors.

Verified 2026-04 · cross-encoder/ms-marco-MiniLM-L-6-v2, cross-encoder/ms-marco-TinyBERT-L-2

Verify ↗