Code beginner · 3 min read

How to use LightGBM in python

Q: How to use LightGBM in python

Use the lightgbm Python package to train and predict with LightGBM models by creating a Dataset, training with lgb.train(), and predicting with model.predict().

Direct answer

Use the lightgbm Python package to train and predict with LightGBM models by creating a Dataset, training with lgb.train(), and predicting with model.predict().

Setup

Install

bash

pip install lightgbm numpy scikit-learn

Imports

python

import lightgbm as lgb
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

Examples

inTrain LightGBM on breast cancer dataset with default parameters

outAccuracy on test set: 0.95

inTrain LightGBM with 100 boosting rounds and early stopping

outAccuracy on test set: 0.96

inPredict probabilities for test samples

out[0.02, 0.98, 0.15, ...]

Integration steps

Install LightGBM and dependencies using pip
Load and split your dataset into training and testing sets
Create a LightGBM Dataset object from training data
Define training parameters and train the model with lgb.train()
Use the trained model to predict on test data
Evaluate predictions with metrics like accuracy_score

Full code

python

import lightgbm as lgb
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create LightGBM dataset
train_data = lgb.Dataset(X_train, label=y_train)

# Define parameters
params = {
    'objective': 'binary',
    'metric': 'binary_logloss',
    'verbose': -1
}

# Train model
model = lgb.train(params, train_data, num_boost_round=100)

# Predict
y_pred_prob = model.predict(X_test)
# Convert probabilities to binary predictions
y_pred = (y_pred_prob > 0.5).astype(int)

# Evaluate
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy on test set: {accuracy:.2f}")

output

Accuracy on test set: 0.95

API trace

Request

json

{"params": {"objective": "binary", "metric": "binary_logloss", "verbose": -1}, "train_data": {"features": [[...]], "labels": [...]}, "num_boost_round": 100}

Response

json

{"model": {"booster": "gbdt", "num_trees": 100, "feature_names": [...], "tree_info": [...]}}

ExtractUse the returned model object from lgb.train() to call model.predict() for inference

Variants

Using LightGBM sklearn API ›

Use the sklearn API for simpler integration with scikit-learn pipelines and familiar fit/predict interface.

python

import lightgbm as lgb
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load data
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize classifier
clf = lgb.LGBMClassifier(n_estimators=100)

# Train
clf.fit(X_train, y_train)

# Predict
y_pred = clf.predict(X_test)

# Evaluate
print(f"Accuracy on test set: {accuracy_score(y_test, y_pred):.2f}")

Early stopping with validation set ›

Use early stopping to prevent overfitting by monitoring validation performance.

python

import lightgbm as lgb
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X, y = load_breast_cancer(return_X_y=True)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

train_data = lgb.Dataset(X_train, label=y_train)
val_data = lgb.Dataset(X_val, label=y_val, reference=train_data)

params = {'objective': 'binary', 'metric': 'binary_logloss', 'verbose': -1}

model = lgb.train(params, train_data, num_boost_round=1000, valid_sets=[val_data], early_stopping_rounds=10)

y_pred_prob = model.predict(X_val, num_iteration=model.best_iteration)
y_pred = (y_pred_prob > 0.5).astype(int)
print(f"Accuracy with early stopping: {accuracy_score(y_val, y_pred):.2f}")

Multiclass classification example ›

Use this pattern for multiclass classification tasks with LightGBM.

python

import lightgbm as lgb
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

train_data = lgb.Dataset(X_train, label=y_train)
params = {'objective': 'multiclass', 'num_class': 3, 'metric': 'multi_logloss', 'verbose': -1}

model = lgb.train(params, train_data, num_boost_round=100)

y_pred_prob = model.predict(X_test)
y_pred = y_pred_prob.argmax(axis=1)
print(f"Multiclass accuracy: {accuracy_score(y_test, y_pred):.2f}")

Performance

Latency~200ms per 100 boosting rounds on typical CPU

CostFree open-source library, no API cost

Rate limitsNo rate limits, runs locally

Use early stopping to reduce training time and tokens if using API wrappers
Limit num_boost_round to avoid overfitting and long training
Use categorical features natively supported by LightGBM to reduce preprocessing tokens

Approach	Latency	Cost/call	Best for
LightGBM native API	~200ms	Free	Full control and speed
LightGBM sklearn API	~250ms	Free	Easy integration with sklearn pipelines
LightGBM with early stopping	~220ms	Free	Prevent overfitting with validation

✓

Quick tip

Use LightGBM's Dataset class to efficiently handle large datasets and speed up training.

⚠

Common mistake

Forgetting to convert predicted probabilities to class labels when doing classification.

Verified 2026-04 · lightgbm

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.