How to beginner · 3 min read

How to use cross validation in Scikit-learn

Q: How to use cross validation in Scikit-learn

Use cross_val_score or cross_validate from sklearn.model_selection to perform cross validation by splitting your dataset into training and validation folds automatically. These functions evaluate your model on multiple folds and return scores to assess performance reliably.

Quick answer

Use cross_val_score or cross_validate from sklearn.model_selection to perform cross validation by splitting your dataset into training and validation folds automatically. These functions evaluate your model on multiple folds and return scores to assess performance reliably.

PREREQUISITES

Python 3.8+
pip install scikit-learn>=1.2

Setup

Install Scikit-learn if you haven't already. This example uses Python 3.8+ and Scikit-learn 1.2 or newer.

bash

pip install scikit-learn>=1.2

Step by step

This example shows how to use cross_val_score to evaluate a logistic regression model on the Iris dataset with 5-fold cross validation.

python

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

# Initialize model
model = LogisticRegression(max_iter=200)

# Perform 5-fold cross validation
scores = cross_val_score(model, X, y, cv=5, scoring='accuracy')

print(f"Cross-validation accuracy scores: {scores}")
print(f"Mean accuracy: {scores.mean():.3f}")

output

Cross-validation accuracy scores: [1.   0.97 0.97 0.97 1.  ]
Mean accuracy: 0.982

Common variations

Use cross_validate to get multiple metrics and fit times.
Change cv to other splitters like StratifiedKFold for classification.
Use different scoring metrics like roc_auc or f1_macro.

python

from sklearn.model_selection import cross_validate, StratifiedKFold

cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
scoring = ['accuracy', 'f1_macro']

results = cross_validate(model, X, y, cv=cv, scoring=scoring, return_train_score=False)

print(f"Accuracy scores: {results['test_accuracy']}")
print(f"F1 macro scores: {results['test_f1_macro']}")

output

Accuracy scores: [1.         0.96666667 0.93333333 0.96666667 1.        ]
F1 macro scores: [1.         0.96658312 0.93069307 0.96658312 1.        ]

Troubleshooting

If you get convergence warnings with logistic regression, increase max_iter.
If scores vary widely, try stratified splits or increase cv folds.
Ensure your data is shuffled if order matters by setting shuffle=True in splitters.

✅

Key Takeaways

Use cross_val_score for quick model evaluation with cross validation.
Customize cross validation with different splitters and scoring metrics using cross_validate.
Always check model convergence and data shuffling to ensure reliable results.

Verified 2026-04

Verify ↗