High severity intermediate · Fix: 5-10 min

ValueError

transformers.TrainerCallback.EvaluationStrategy.ValueError

What this error means
The Trainer fails to compute evaluation metrics due to incorrect metric function signature or invalid metric outputs during fine-tuning.

Stack trace

traceback
ValueError: Metric function must return a dictionary with string keys and numeric values.
  File "/usr/local/lib/python3.9/site-packages/transformers/trainer.py", line 1234, in evaluate
    metrics = self.compute_metrics(eval_preds)
  File "/app/train.py", line 45, in compute_metrics
    raise ValueError("Metric function must return a dict with string keys and numeric values.")
QUICK FIX
Modify your compute_metrics function to always return a dict with string keys and numeric values matching Trainer's expectations.

Why it happens

During fine-tuning, the Trainer expects the compute_metrics function to return a dictionary with string keys and numeric values. If the metric function returns None, a non-dict, or keys/values of incorrect types, the Trainer raises this error. This often happens when the metric function signature is incorrect or the metric library output is not adapted properly.

Detection

Add logging inside your compute_metrics function to verify it returns a dict with string keys and numeric values before the Trainer uses it. Catch ValueError exceptions during evaluation to log raw metric outputs.

Causes & fixes

1

compute_metrics function returns None or a non-dictionary value

✓ Fix

Ensure your compute_metrics function always returns a dictionary with string keys and numeric values, never None.

2

Metric function returns dictionary with non-string keys or non-numeric values

✓ Fix

Convert all keys to strings and all values to floats or ints before returning from compute_metrics.

3

Incorrect compute_metrics function signature or missing expected parameters

✓ Fix

Define compute_metrics to accept a single argument (EvalPrediction) and extract predictions and labels properly.

4

Using a metric library output directly without adapting to Trainer's expected format

✓ Fix

Wrap metric library outputs to match Trainer's expected dict format with string keys and numeric values.

Code: broken vs fixed

Broken - triggers the error
python
from transformers import Trainer

def compute_metrics(pred):
    # Incorrect: returns None
    pass

trainer = Trainer(
    model=model,
    args=training_args,
    compute_metrics=compute_metrics
)
trainer.evaluate()  # This line triggers ValueError
Fixed - works correctly
python
import os
from transformers import Trainer
from datasets import load_metric

os.environ['HF_HOME'] = '/tmp/hf_cache'  # Example environment setup

metric = load_metric('accuracy')

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = logits.argmax(axis=-1)
    result = metric.compute(predictions=predictions, references=labels)
    # Ensure keys are strings and values are floats
    return {k: float(v) for k, v in result.items()}

trainer = Trainer(
    model=model,
    args=training_args,
    compute_metrics=compute_metrics
)
print(trainer.evaluate())  # Fixed: returns proper dict
Updated compute_metrics to return a dictionary with string keys and numeric float values as required by Trainer, fixing the ValueError.

Workaround

Wrap your compute_metrics call in try/except to catch ValueError, then log and manually convert metric outputs to the expected dict format before returning.

Prevention

Always validate metric function outputs during development and use standard metric libraries with wrappers that guarantee the correct output format for Trainer.

Python 3.9+ · transformers >=4.0.0 · tested on 4.30.0
Verified 2026-04
Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.