Precision recall F1 for classification
Quick answer
Use
precision, recall, and f1_score metrics to evaluate classification models. These metrics measure the accuracy of positive predictions, the ability to find all positive samples, and their harmonic mean respectively, and can be computed easily with scikit-learn.PREREQUISITES
Python 3.8+pip install scikit-learn
Setup
Install scikit-learn if you haven't already. This library provides built-in functions to calculate precision, recall, and F1 score for classification tasks.
pip install scikit-learn output
Requirement already satisfied: scikit-learn in /usr/local/lib/python3.10/site-packages (1.3.0) Requirement already satisfied: numpy>=1.17.3 in /usr/local/lib/python3.10/site-packages (from scikit-learn) (1.25.0) Requirement already satisfied: scipy>=1.5.0 in /usr/local/lib/python3.10/site-packages (from scikit-learn) (1.11.1)
Step by step
Use precision_score, recall_score, and f1_score from sklearn.metrics to compute these metrics from true and predicted labels.
from sklearn.metrics import precision_score, recall_score, f1_score
# Example true labels and predicted labels
true_labels = [1, 0, 1, 1, 0, 1, 0, 0, 1, 0]
pred_labels = [1, 0, 0, 1, 0, 1, 1, 0, 1, 0]
# Calculate precision, recall, and F1 score
precision = precision_score(true_labels, pred_labels)
recall = recall_score(true_labels, pred_labels)
f1 = f1_score(true_labels, pred_labels)
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1 Score: {f1:.2f}") output
Precision: 0.80 Recall: 0.80 F1 Score: 0.80
Common variations
You can compute metrics for multi-class classification by specifying the average parameter (e.g., macro, micro, weighted). For example, f1_score(y_true, y_pred, average='macro').
Also, you can use these metrics asynchronously or integrate them with AI model evaluation pipelines.
from sklearn.metrics import f1_score
# Multi-class example
true_labels = [0, 1, 2, 2, 1]
pred_labels = [0, 2, 2, 2, 0]
f1_macro = f1_score(true_labels, pred_labels, average='macro')
f1_micro = f1_score(true_labels, pred_labels, average='micro')
print(f"F1 Macro: {f1_macro:.2f}")
print(f"F1 Micro: {f1_micro:.2f}") output
F1 Macro: 0.53 F1 Micro: 0.60
Troubleshooting
- If you get a
UndefinedMetricWarning, it means there are no positive predictions for a class; consider settingzero_division=0in metric functions. - Ensure your true and predicted label arrays have the same length and correct label encoding.
precision = precision_score(true_labels, pred_labels, zero_division=0)
recall = recall_score(true_labels, pred_labels, zero_division=0)
f1 = f1_score(true_labels, pred_labels, zero_division=0) output
Precision: 0.80 Recall: 0.80 F1 Score: 0.80
Key Takeaways
- Use
scikit-learnmetrics to easily compute precision, recall, and F1 for classification. - Specify
averageparameter for multi-class classification metrics. - Handle zero division warnings by setting
zero_division=0in metric functions.