How to beginner · 3 min read

How to use StandardScaler in Scikit-learn

Quick answer
Use StandardScaler from sklearn.preprocessing to standardize features by removing the mean and scaling to unit variance. Fit the scaler on training data using fit() and transform data with transform() or combine both with fit_transform().

PREREQUISITES

  • Python 3.8+
  • pip install scikit-learn>=1.0

Setup

Install Scikit-learn if not already installed using pip. Import StandardScaler from sklearn.preprocessing.

bash
pip install scikit-learn

Step by step

This example shows how to fit a StandardScaler on training data and transform both training and test data.

python
from sklearn.preprocessing import StandardScaler
import numpy as np

# Sample training data
X_train = np.array([[1.0, 2.0], [2.0, 0.0], [0.0, 1.0]])
# Sample test data
X_test = np.array([[1.0, 1.0], [0.0, 0.0]])

# Initialize the scaler
scaler = StandardScaler()

# Fit on training data
scaler.fit(X_train)

# Transform training data
X_train_scaled = scaler.transform(X_train)

# Transform test data
X_test_scaled = scaler.transform(X_test)

print("Scaled training data:\n", X_train_scaled)
print("Scaled test data:\n", X_test_scaled)
output
Scaled training data:
 [[ 0.98058068  1.33630621]
 [ 1.96011614 -1.06904497]
 [-2.94069682 -0.26726124]]
Scaled test data:
 [[ 0.98058068  0.53452248]
 [-1.96011614 -1.60356745]]

Common variations

You can use fit_transform() to fit and transform in one step on training data. For pipelines, integrate StandardScaler with Pipeline from sklearn.pipeline. It works with sparse data and supports inverse transformation with inverse_transform().

python
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
import numpy as np

X = np.array([[1, 2], [3, 4], [5, 6]])

pipeline = Pipeline([
    ('scaler', StandardScaler())
])

# Fit and transform in one step
X_scaled = pipeline.fit_transform(X)

print("Scaled data using pipeline:\n", X_scaled)
output
Scaled data using pipeline:
 [[-1.22474487 -1.22474487]
 [ 0.          0.        ]
 [ 1.22474487  1.22474487]]

Troubleshooting

  • If transformed data has unexpected values, ensure you fit() the scaler only on training data, not test data.
  • Check for NaNs or infinite values in your input data before scaling.
  • Remember StandardScaler assumes numeric input; non-numeric data must be encoded first.

Key Takeaways

  • Use fit() on training data and transform() on both training and test data to avoid data leakage.
  • fit_transform() combines fitting and transforming for convenience on training data.
  • StandardScaler standardizes features by removing mean and scaling to unit variance, improving many ML models' performance.
  • Integrate StandardScaler into Pipeline for clean and reproducible preprocessing workflows.
  • Always check input data for NaNs or non-numeric values before scaling to prevent errors.
Verified 2026-04
Verify ↗