Concept beginner · 3 min read

What is F1 score in machine learning

Quick answer
The F1 score is the harmonic mean of precision and recall, used to measure a model's accuracy on imbalanced classification tasks. It balances false positives and false negatives, providing a single metric that captures both. In PyTorch, it can be computed using predictions and true labels to evaluate classification performance.
F1 score is a classification metric that combines precision and recall into a single value to evaluate model accuracy, especially on imbalanced datasets.

How it works

The F1 score is calculated as the harmonic mean of precision and recall. Precision measures how many predicted positives are actually correct, while recall measures how many actual positives were correctly identified. The formula is:

F1 = 2 * (precision * recall) / (precision + recall)

This metric is useful because it balances the trade-off between false positives and false negatives, making it ideal for imbalanced classification problems where accuracy alone can be misleading.

Concrete example

Here is a simple example of computing the F1 score in PyTorch for a binary classification task:

python
import torch
from sklearn.metrics import f1_score

# True labels and predicted labels
true_labels = torch.tensor([1, 0, 1, 1, 0, 1, 0, 0])
pred_labels = torch.tensor([1, 0, 0, 1, 0, 1, 1, 0])

# Convert tensors to numpy arrays for sklearn
true_np = true_labels.numpy()
pred_np = pred_labels.numpy()

# Calculate F1 score
f1 = f1_score(true_np, pred_np)
print(f"F1 score: {f1:.2f}")
output
F1 score: 0.80

When to use it

Use the F1 score when you need a balance between precision and recall, especially in cases of imbalanced datasets where one class is much rarer than the other. It is not ideal when you want to prioritize either precision or recall exclusively. For example, use it in fraud detection, medical diagnosis, or spam detection where false positives and false negatives have different costs but both matter.

Key terms

TermDefinition
PrecisionRatio of true positives to all predicted positives.
RecallRatio of true positives to all actual positives.
False PositiveIncorrectly predicted positive instance.
False NegativeIncorrectly predicted negative instance.
Harmonic MeanA type of average that balances two rates, used in F1 score.

Key Takeaways

  • The F1 score balances precision and recall into a single metric for classification evaluation.
  • Use F1 score on imbalanced datasets where accuracy can be misleading.
  • PyTorch tensors can be converted to numpy arrays to compute F1 score with sklearn.
  • F1 score is not suitable if you need to prioritize precision or recall individually.
  • Understanding false positives and false negatives is critical to interpreting F1 score.
Verified 2026-04
Verify ↗