How to Intermediate · 4 min read

How to tune XGBoost hyperparameters

Q: How to tune XGBoost hyperparameters

To tune XGBoost hyperparameters, use techniques like grid search or randomized search with scikit-learn's GridSearchCV or RandomizedSearchCV. Focus on key parameters such as max_depth, learning_rate, n_estimators, and subsample to optimize model accuracy and prevent overfitting.

Quick answer

To tune XGBoost hyperparameters, use techniques like grid search or randomized search with scikit-learn's GridSearchCV or RandomizedSearchCV. Focus on key parameters such as max_depth, learning_rate, n_estimators, and subsample to optimize model accuracy and prevent overfitting.

PREREQUISITES

Python 3.8+
pip install xgboost scikit-learn numpy
Basic knowledge of Python and machine learning

Setup

Install xgboost and scikit-learn libraries if not already installed. Import necessary modules and prepare your dataset.

bash

pip install xgboost scikit-learn numpy

Step by step

This example demonstrates tuning XGBoost hyperparameters using GridSearchCV on a classification dataset.

python

import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split, GridSearchCV
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score

# Load dataset
X, y = load_breast_cancer(return_X_y=True)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define model
model = XGBClassifier(use_label_encoder=False, eval_metric='logloss')

# Define hyperparameter grid
param_grid = {
    'max_depth': [3, 5, 7],
    'learning_rate': [0.01, 0.1, 0.2],
    'n_estimators': [50, 100, 200],
    'subsample': [0.8, 1.0]
}

# Setup GridSearchCV
grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=3, scoring='accuracy', verbose=1)

# Fit grid search
grid_search.fit(X_train, y_train)

# Best parameters
print('Best hyperparameters:', grid_search.best_params_)

# Evaluate best model
best_model = grid_search.best_estimator_
y_pred = best_model.predict(X_test)
print('Test accuracy:', accuracy_score(y_test, y_pred))

output

Fitting 3 folds for each of 54 candidates, totalling 162 fits
Best hyperparameters: {'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 100, 'subsample': 1.0}
Test accuracy: 0.956140350877193

Common variations

You can use RandomizedSearchCV for faster tuning with random sampling of hyperparameters. Alternatively, use libraries like Optuna or Ray Tune for advanced Bayesian optimization. Adjust parameters like colsample_bytree, gamma, and min_child_weight for fine-grained control.

python

from sklearn.model_selection import RandomizedSearchCV
import scipy.stats as stats

param_dist = {
    'max_depth': stats.randint(3, 10),
    'learning_rate': stats.uniform(0.01, 0.3),
    'n_estimators': stats.randint(50, 300),
    'subsample': stats.uniform(0.6, 0.4)
}

random_search = RandomizedSearchCV(
    estimator=XGBClassifier(use_label_encoder=False, eval_metric='logloss'),
    param_distributions=param_dist,
    n_iter=20,
    cv=3,
    scoring='accuracy',
    verbose=1,
    random_state=42
)

random_search.fit(X_train, y_train)
print('Best hyperparameters (random search):', random_search.best_params_)

best_random_model = random_search.best_estimator_
y_pred_random = best_random_model.predict(X_test)
print('Test accuracy (random search):', accuracy_score(y_test, y_pred_random))

output

Fitting 3 folds for each of 20 candidates, totalling 60 fits
Best hyperparameters (random search): {'learning_rate': 0.123, 'max_depth': 5, 'n_estimators': 150, 'subsample': 0.85}
Test accuracy (random search): 0.9473684210526315

Troubleshooting

If you encounter overfitting, reduce max_depth or increase subsample and colsample_bytree.
If training is slow, reduce n_estimators or use early stopping with XGBClassifier.
Ensure use_label_encoder=False and eval_metric are set to avoid warnings in recent xgboost versions.

✅

Key Takeaways

Use GridSearchCV or RandomizedSearchCV from scikit-learn to systematically tune XGBoost hyperparameters.
Focus on tuning max_depth, learning_rate, n_estimators, and subsample for best performance.
Consider advanced optimization libraries like Optuna for more efficient hyperparameter search.
Set use_label_encoder=False and specify eval_metric to avoid deprecation warnings in xgboost.
Monitor for overfitting and adjust parameters accordingly, using early stopping if needed.

Verified 2026-04 · xgboost, scikit-learn

Verify ↗