ValueError
transformers.TrainerCallback.EvaluationStrategy.ValueError
Stack trace
ValueError: Metric function must return a dictionary with string keys and numeric values.
File "/usr/local/lib/python3.9/site-packages/transformers/trainer.py", line 1234, in evaluate
metrics = self.compute_metrics(eval_preds)
File "/app/train.py", line 45, in compute_metrics
raise ValueError("Metric function must return a dict with string keys and numeric values.") Why it happens
During fine-tuning, the Trainer expects the compute_metrics function to return a dictionary with string keys and numeric values. If the metric function returns None, a non-dict, or keys/values of incorrect types, the Trainer raises this error. This often happens when the metric function signature is incorrect or the metric library output is not adapted properly.
Detection
Add logging inside your compute_metrics function to verify it returns a dict with string keys and numeric values before the Trainer uses it. Catch ValueError exceptions during evaluation to log raw metric outputs.
Causes & fixes
compute_metrics function returns None or a non-dictionary value
Ensure your compute_metrics function always returns a dictionary with string keys and numeric values, never None.
Metric function returns dictionary with non-string keys or non-numeric values
Convert all keys to strings and all values to floats or ints before returning from compute_metrics.
Incorrect compute_metrics function signature or missing expected parameters
Define compute_metrics to accept a single argument (EvalPrediction) and extract predictions and labels properly.
Using a metric library output directly without adapting to Trainer's expected format
Wrap metric library outputs to match Trainer's expected dict format with string keys and numeric values.
Code: broken vs fixed
from transformers import Trainer
def compute_metrics(pred):
# Incorrect: returns None
pass
trainer = Trainer(
model=model,
args=training_args,
compute_metrics=compute_metrics
)
trainer.evaluate() # This line triggers ValueError import os
from transformers import Trainer
from datasets import load_metric
os.environ['HF_HOME'] = '/tmp/hf_cache' # Example environment setup
metric = load_metric('accuracy')
def compute_metrics(eval_pred):
logits, labels = eval_pred
predictions = logits.argmax(axis=-1)
result = metric.compute(predictions=predictions, references=labels)
# Ensure keys are strings and values are floats
return {k: float(v) for k, v in result.items()}
trainer = Trainer(
model=model,
args=training_args,
compute_metrics=compute_metrics
)
print(trainer.evaluate()) # Fixed: returns proper dict Workaround
Wrap your compute_metrics call in try/except to catch ValueError, then log and manually convert metric outputs to the expected dict format before returning.
Prevention
Always validate metric function outputs during development and use standard metric libraries with wrappers that guarantee the correct output format for Trainer.