ValueError
transformers.TrainerCallback.ValueError
Stack trace
ValueError: Evaluation metric 'accuracy' not found in the metrics returned by compute_metrics.
at transformers.Trainer.evaluate() Why it happens
The Huggingface Trainer expects the compute_metrics function to return a dictionary containing the evaluation metrics specified in the TrainerCallback. If the metric key is missing or misspelled, the callback cannot find it and raises this error.
Detection
Check that your compute_metrics function returns a dictionary with all expected metric keys exactly matching those used in callbacks or logging before calling Trainer.evaluate().
Causes & fixes
compute_metrics function does not return the expected metric key
Ensure your compute_metrics function returns a dictionary with the exact metric keys required by your TrainerCallback, e.g., {'accuracy': value}.
Mismatch between metric names used in TrainerCallback and those returned by compute_metrics
Align the metric names in your TrainerCallback configuration with the keys returned by compute_metrics, respecting case sensitivity.
compute_metrics function is missing or not passed to Trainer
Define and pass a valid compute_metrics function to the Trainer constructor to enable metric calculation during evaluation.
Code: broken vs fixed
from transformers import Trainer
def compute_metrics(eval_pred):
logits, labels = eval_pred
predictions = logits.argmax(axis=-1)
# Missing 'accuracy' key in returned dict
return {'acc': (predictions == labels).mean()}
trainer = Trainer(
model=model,
args=training_args,
compute_metrics=compute_metrics
)
trainer.evaluate() # This line triggers the error because 'accuracy' metric is not found import os
from transformers import Trainer
def compute_metrics(eval_pred):
logits, labels = eval_pred
predictions = logits.argmax(axis=-1)
# Fixed: return 'accuracy' key as expected
return {'accuracy': (predictions == labels).mean()}
trainer = Trainer(
model=model,
args=training_args,
compute_metrics=compute_metrics
)
trainer.evaluate() # Fixed: no error, metric found
print('Evaluation completed successfully') Workaround
Wrap the evaluation call in try/except ValueError, and if the metric is missing, log the raw output of compute_metrics to debug and manually extract metrics as a fallback.
Prevention
Always define and test your compute_metrics function to return all metrics expected by callbacks before training or evaluation to avoid runtime errors.