How to beginner · 3 min read

How to use early stopping in XGBoost

Quick answer
Use the early_stopping_rounds parameter in xgboost.train or XGBClassifier.fit along with a validation set to stop training when the evaluation metric stops improving. This helps prevent overfitting and reduces training time.

PREREQUISITES

  • Python 3.8+
  • pip install xgboost>=1.7.0
  • pip install scikit-learn>=1.0

Setup

Install xgboost and scikit-learn if not already installed. Import necessary libraries and prepare your dataset.

bash
pip install xgboost scikit-learn

Step by step

This example shows how to use early stopping with XGBClassifier on the Iris dataset. We split data into training and validation sets, then pass early_stopping_rounds and eval_set to fit. The training stops if the validation metric does not improve for the specified rounds.

python
import xgboost as xgb
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load data
iris = load_iris()
X, y = iris.data, iris.target

# Split into train and validation
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize model
model = xgb.XGBClassifier(
    objective='multi:softprob',
    eval_metric='mlogloss',
    use_label_encoder=False,
    random_state=42
)

# Train with early stopping
model.fit(
    X_train, y_train,
    eval_set=[(X_val, y_val)],
    early_stopping_rounds=10,
    verbose=True
)

# Predict and evaluate
preds = model.predict(X_val)
accuracy = accuracy_score(y_val, preds)
print(f"Validation accuracy: {accuracy:.4f}")
output
Validation accuracy: 1.0000

Common variations

  • Use xgboost.train with DMatrix for more control over training and early stopping.
  • Change eval_metric to suit your problem (e.g., auc for binary classification).
  • Adjust early_stopping_rounds to control patience before stopping.
python
import xgboost as xgb
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

# Load binary classification data
data = load_breast_cancer()
X, y = data.data, data.target

X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

# Create DMatrix
train_dmatrix = xgb.DMatrix(X_train, label=y_train)
val_dmatrix = xgb.DMatrix(X_val, label=y_val)

params = {
    'objective': 'binary:logistic',
    'eval_metric': 'auc',
    'seed': 42
}

# Train with early stopping
bst = xgb.train(
    params,
    train_dmatrix,
    num_boost_round=1000,
    evals=[(val_dmatrix, 'validation')],
    early_stopping_rounds=20,
    verbose_eval=True
)

print(f"Best iteration: {bst.best_iteration}")
output
Will print evaluation metric per iteration and best iteration number

Troubleshooting

  • If early stopping does not trigger, check that eval_set or evals is correctly specified.
  • Ensure the evaluation metric matches your problem type and is supported by XGBoost.
  • Verbose output helps verify training progress and early stopping behavior.

Key Takeaways

  • Use early_stopping_rounds with a validation set to prevent overfitting in XGBoost.
  • Pass eval_set in XGBClassifier.fit or evals in xgboost.train for monitoring.
  • Adjust early_stopping_rounds and eval_metric based on your dataset and task.
  • Verbose training output helps confirm early stopping is working as expected.
Verified 2026-04 · xgboost 1.7.0+
Verify ↗