ValueError
ragas.evaluation.metrics.ValueError
Stack trace
Traceback (most recent call last):
File "test_ragas.py", line 42, in <module>
score = ragas.evaluate_metric(predictions, references)
File "/usr/local/lib/python3.9/site-packages/ragas/evaluation/metrics.py", line 88, in evaluate_metric
result = compute_metric(preds, refs)
File "/usr/local/lib/python3.9/site-packages/ragas/evaluation/metrics.py", line 55, in compute_metric
raise ValueError('Metric calculation resulted in NaN value')
ValueError: Metric calculation resulted in NaN value Why it happens
Ragas evaluation metrics compute scores based on predictions and references. If the input data contains empty lists, mismatched lengths, or all zero/empty values, the metric calculation can produce NaN due to division by zero or invalid operations.
Detection
Add validation checks before metric computation to ensure inputs are non-empty, aligned in length, and contain valid numeric data to catch NaN risks early.
Causes & fixes
Input prediction or reference lists are empty or contain only empty strings
Validate and filter out empty or null entries from predictions and references before passing to the metric function.
Mismatch in length between predictions and references causing invalid metric computation
Ensure predictions and references lists have the same length before evaluation.
All prediction outputs or references are identical or zero, leading to division by zero in metric formula
Add a check to detect uniform or zero-only inputs and handle them by returning a default score or skipping metric calculation.
Code: broken vs fixed
import os
import ragas
predictions = ["", "", ""] # Empty predictions
references = ["Answer1", "Answer2", "Answer3"]
# This line triggers ValueError due to NaN metric
score = ragas.evaluate_metric(predictions, references)
print(f"Ragas score: {score}") import os
import ragas
# Clean empty predictions
predictions = ["", "", ""]
predictions = [p for p in predictions if p.strip() != ""]
references = ["Answer1", "Answer2", "Answer3"]
# Ensure lengths match after cleaning
if len(predictions) != len(references):
raise ValueError("Predictions and references must have the same length after cleaning")
score = ragas.evaluate_metric(predictions, references) # Fixed: cleaned inputs
print(f"Ragas score: {score}") Workaround
Wrap the metric call in try/except ValueError, and on exception, log inputs and return a default score like 0.0 to avoid crashing.
Prevention
Implement strict input validation and normalization pipelines before evaluation, and use unit tests to catch empty or invalid data cases early.