What is F1 score in machine learning
F1 score is the harmonic mean of precision and recall, used to measure a model's accuracy on imbalanced classification tasks. It balances false positives and false negatives, providing a single metric that captures both. In PyTorch, it can be computed using predictions and true labels to evaluate classification performance.How it works
The F1 score is calculated as the harmonic mean of precision and recall. Precision measures how many predicted positives are actually correct, while recall measures how many actual positives were correctly identified. The formula is:
F1 = 2 * (precision * recall) / (precision + recall)
This metric is useful because it balances the trade-off between false positives and false negatives, making it ideal for imbalanced classification problems where accuracy alone can be misleading.
Concrete example
Here is a simple example of computing the F1 score in PyTorch for a binary classification task:
import torch
from sklearn.metrics import f1_score
# True labels and predicted labels
true_labels = torch.tensor([1, 0, 1, 1, 0, 1, 0, 0])
pred_labels = torch.tensor([1, 0, 0, 1, 0, 1, 1, 0])
# Convert tensors to numpy arrays for sklearn
true_np = true_labels.numpy()
pred_np = pred_labels.numpy()
# Calculate F1 score
f1 = f1_score(true_np, pred_np)
print(f"F1 score: {f1:.2f}") F1 score: 0.80
When to use it
Use the F1 score when you need a balance between precision and recall, especially in cases of imbalanced datasets where one class is much rarer than the other. It is not ideal when you want to prioritize either precision or recall exclusively. For example, use it in fraud detection, medical diagnosis, or spam detection where false positives and false negatives have different costs but both matter.
Key terms
| Term | Definition |
|---|---|
| Precision | Ratio of true positives to all predicted positives. |
| Recall | Ratio of true positives to all actual positives. |
| False Positive | Incorrectly predicted positive instance. |
| False Negative | Incorrectly predicted negative instance. |
| Harmonic Mean | A type of average that balances two rates, used in F1 score. |
Key Takeaways
- The F1 score balances precision and recall into a single metric for classification evaluation.
- Use F1 score on imbalanced datasets where accuracy can be misleading.
- PyTorch tensors can be converted to numpy arrays to compute F1 score with sklearn.
- F1 score is not suitable if you need to prioritize precision or recall individually.
- Understanding false positives and false negatives is critical to interpreting F1 score.