How to beginner · 3 min read

NDCG metric for reranking evaluation

Q: NDCG metric for reranking evaluation

The NDCG (Normalized Discounted Cumulative Gain) metric evaluates the quality of reranking by measuring the usefulness of ranked results based on their positions and relevance scores. It normalizes the cumulative gain by the ideal ranking to provide a score between 0 and 1, where higher is better.

Quick answer

The NDCG (Normalized Discounted Cumulative Gain) metric evaluates the quality of reranking by measuring the usefulness of ranked results based on their positions and relevance scores. It normalizes the cumulative gain by the ideal ranking to provide a score between 0 and 1, where higher is better.

PREREQUISITES

Python 3.8+
pip install numpy scikit-learn

Setup

Install the required Python packages numpy and scikit-learn for numerical operations and metric calculation.

bash

pip install numpy scikit-learn

Step by step

This example demonstrates how to compute the NDCG metric for reranking evaluation using Python's sklearn.metrics.ndcg_score. It assumes you have relevance scores for a list of documents before and after reranking.

python

import numpy as np
from sklearn.metrics import ndcg_score

# Ground truth relevance scores for documents (higher is more relevant)
true_relevance = np.asarray([[3, 2, 3, 0, 1, 2]])

# Predicted scores from your reranking model
predicted_scores = np.asarray([[0.9, 0.8, 0.75, 0.1, 0.4, 0.7]])

# Compute NDCG@6 (all documents)
ndcg = ndcg_score(true_relevance, predicted_scores, k=6)
print(f"NDCG@6: {ndcg:.4f}")

output

NDCG@6: 0.9612

Common variations

Adjust k in ndcg_score to evaluate top-k results (e.g., k=3 for NDCG@3).
Use batch inputs to evaluate multiple queries at once by passing arrays with shape (n_queries, n_docs).
Calculate DCG or IDCG separately if custom weighting or analysis is needed.

python

import numpy as np
from sklearn.metrics import ndcg_score

# Multiple queries example
true_relevance = np.array([
    [3, 2, 3, 0, 1, 2],
    [2, 1, 2, 3, 0, 1]
])
predicted_scores = np.array([
    [0.9, 0.8, 0.75, 0.1, 0.4, 0.7],
    [0.5, 0.6, 0.4, 0.9, 0.2, 0.3]
])

ndcg_batch = ndcg_score(true_relevance, predicted_scores, k=3)
print(f"Batch NDCG@3: {ndcg_batch:.4f}")

output

Batch NDCG@3: 0.8889

Troubleshooting

If ndcg_score returns 0, verify that your true_relevance array contains positive relevance scores; zero or negative values can cause invalid results.
Ensure predicted scores are aligned with the documents' order in true_relevance.
Check that k does not exceed the number of documents.

✅

Key Takeaways

Use sklearn.metrics.ndcg_score to compute NDCG easily for reranking evaluation.
Adjust the k parameter to focus on top-k ranked results for more relevant evaluation.
Provide true relevance and predicted scores as arrays aligned by document order for accurate scoring.

Verified 2026-04

Verify ↗