How to beginner · 3 min read

Confusion matrix interpretation

Quick answer
A confusion matrix summarizes classification results by showing counts of true positives, true negatives, false positives, and false negatives. It helps evaluate model performance metrics like accuracy, precision, and recall to understand prediction quality.

PREREQUISITES

  • Python 3.8+
  • pip install scikit-learn
  • Basic understanding of classification tasks

Setup

Install scikit-learn for generating and interpreting confusion matrices in Python. Set up your environment with the following command:

bash
pip install scikit-learn
output
Requirement already satisfied: scikit-learn in /usr/local/lib/python3.10/site-packages (1.3.0)
Requirement already satisfied: numpy>=1.17.3 in /usr/local/lib/python3.10/site-packages (from scikit-learn) (1.25.0)
Requirement already satisfied: scipy>=1.5.0 in /usr/local/lib/python3.10/site-packages (from scikit-learn) (1.11.1)

Step by step

Use scikit-learn to compute and interpret a confusion matrix for a binary classification example. The matrix shows counts of TP, TN, FP, and FN. From these, calculate accuracy, precision, and recall.

python
from sklearn.metrics import confusion_matrix
import numpy as np

# True labels and predicted labels
true_labels = np.array([1, 0, 1, 1, 0, 1, 0, 0, 1, 0])
pred_labels = np.array([1, 0, 1, 0, 0, 1, 1, 0, 1, 0])

# Compute confusion matrix
cm = confusion_matrix(true_labels, pred_labels)

# cm layout: [[TN FP]
#             [FN TP]]
TN, FP, FN, TP = cm.ravel()

# Calculate metrics
accuracy = (TP + TN) / (TP + TN + FP + FN)
precision = TP / (TP + FP) if (TP + FP) > 0 else 0
recall = TP / (TP + FN) if (TP + FN) > 0 else 0

print(f"Confusion matrix:\n{cm}")
print(f"Accuracy: {accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
output
Confusion matrix:
[[4 1]
 [1 4]]
Accuracy: 0.80
Precision: 0.80
Recall: 0.80

Common variations

You can interpret confusion matrices for multiclass classification by using confusion_matrix with multiple classes. For asynchronous or streaming AI SDKs, the confusion matrix calculation remains local and synchronous.

Different models or SDKs do not affect confusion matrix interpretation but may affect prediction quality reflected in the matrix.

python
from sklearn.metrics import confusion_matrix

# Multiclass example
true_labels = ["cat", "dog", "cat", "bird", "dog", "bird"]
pred_labels = ["dog", "dog", "cat", "bird", "cat", "bird"]

cm = confusion_matrix(true_labels, pred_labels, labels=["cat", "dog", "bird"])
print("Multiclass confusion matrix:")
print(cm)
output
Multiclass confusion matrix:
[[1 1 0]
 [1 1 0]
 [0 0 2]]

Troubleshooting

  • If you see unexpected confusion matrix shapes, verify your true_labels and pred_labels arrays have matching lengths.
  • If metrics like precision or recall are NaN or zero, check for zero division when no positive predictions or true positives exist.
  • For multiclass, ensure you specify the labels parameter to maintain consistent class order.

Key Takeaways

  • A confusion matrix shows counts of true/false positives and negatives to evaluate classification.
  • Calculate accuracy, precision, and recall from confusion matrix values to measure model performance.
  • Use scikit-learn for easy confusion matrix computation and interpretation in Python.
  • For multiclass tasks, specify class labels to get a consistent confusion matrix layout.
  • Check input label arrays and handle zero division to avoid metric calculation errors.
Verified 2026-04
Verify ↗