How to beginner · 3 min read

How to test machine learning models

Q: How to test machine learning models

To test machine learning models, use a separate test dataset to evaluate performance metrics like accuracy, precision, and recall. Employ techniques such as cross-validation and confusion matrices to ensure your model generalizes well and avoids overfitting.

Quick answer

To test machine learning models, use a separate test dataset to evaluate performance metrics like accuracy, precision, and recall. Employ techniques such as cross-validation and confusion matrices to ensure your model generalizes well and avoids overfitting.

PREREQUISITES

Python 3.8+
pip install scikit-learn>=1.2
Basic knowledge of machine learning concepts

Setup

Install the necessary Python package scikit-learn for model evaluation and metrics.

bash

pip install scikit-learn>=1.2

Step by step

This example shows how to train a simple classifier, split data, and test the model using accuracy and a confusion matrix.

python

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)

# Predict on test set
predictions = model.predict(X_test)

# Evaluate
acc = accuracy_score(y_test, predictions)
cm = confusion_matrix(y_test, predictions)

print(f"Accuracy: {acc:.2f}")
print("Confusion Matrix:\n", cm)

output

Accuracy: 1.00
Confusion Matrix:
 [[10  0  0]
  [ 0  8  0]
  [ 0  0 12]]

Common variations

You can use cross-validation to better estimate model performance by splitting data multiple times. For large datasets, consider streaming evaluation or using different models like GradientBoostingClassifier.

python

from sklearn.model_selection import cross_val_score
from sklearn.ensemble import GradientBoostingClassifier

model = GradientBoostingClassifier(random_state=42)
scores = cross_val_score(model, X, y, cv=5, scoring='accuracy')
print(f"Cross-validation accuracies: {scores}")
print(f"Mean accuracy: {scores.mean():.2f}")

output

Cross-validation accuracies: [1.   0.97 1.   0.97 1.  ]
Mean accuracy: 0.99

Troubleshooting

If your model shows very high training accuracy but low test accuracy, it is likely overfitting; try regularization or more data. If metrics are unexpectedly low, verify data preprocessing and label correctness.

✅

Key Takeaways

Always evaluate machine learning models on a separate test set to measure true performance.
Use metrics like accuracy, precision, recall, and confusion matrices to understand model behavior.
Cross-validation provides a more robust estimate of model generalization than a single train-test split.

Verified 2026-04

Verify ↗