How to beginner · 3 min read

How to detect outliers in pandas

Quick answer
Use pandas with statistical methods like the interquartile range (IQR) or Z-score to detect outliers in your data. Calculate these metrics on your DataFrame columns and filter rows where values fall outside typical bounds.

PREREQUISITES

  • Python 3.8+
  • pip install pandas>=1.5

Setup

Install pandas if not already installed and import it for data manipulation.

bash
pip install pandas

Step by step

This example shows how to detect outliers in a pandas DataFrame using the IQR method and Z-score method.

python
import pandas as pd
import numpy as np

# Sample data
np.random.seed(0)
data = pd.DataFrame({
    'value': np.append(np.random.normal(50, 5, 100), [100, 105, 110])
})

# IQR method
Q1 = data['value'].quantile(0.25)
Q3 = data['value'].quantile(0.75)
IQR = Q3 - Q1

outliers_iqr = data[(data['value'] < (Q1 - 1.5 * IQR)) | (data['value'] > (Q3 + 1.5 * IQR))]

print("Outliers detected by IQR method:")
print(outliers_iqr)

# Z-score method
mean_val = data['value'].mean()
std_val = data['value'].std()
z_scores = (data['value'] - mean_val) / std_val

outliers_z = data[(z_scores < -3) | (z_scores > 3)]

print("\nOutliers detected by Z-score method:")
print(outliers_z)
output
Outliers detected by IQR method:
    value
100    100.0
101    105.0
102    110.0

Outliers detected by Z-score method:
    value
100    100.0
101    105.0
102    110.0

Common variations

You can customize the threshold multiplier in the IQR method (default 1.5) or the Z-score cutoff (default 3) to be more or less sensitive. For multivariate data, consider using Mahalanobis distance or clustering methods.

python
def detect_outliers_iqr(df, column, multiplier=1.5):
    Q1 = df[column].quantile(0.25)
    Q3 = df[column].quantile(0.75)
    IQR = Q3 - Q1
    return df[(df[column] < (Q1 - multiplier * IQR)) | (df[column] > (Q3 + multiplier * IQR))]

outliers_custom = detect_outliers_iqr(data, 'value', multiplier=2)
print("Outliers with multiplier=2:")
print(outliers_custom)
output
Outliers with multiplier=2:
    value
102    110.0

Troubleshooting

If no outliers are detected, verify your data distribution and thresholds. Extremely skewed data may require transformations like log or Box-Cox before outlier detection.

Key Takeaways

  • Use the IQR method to detect outliers by filtering values outside 1.5 times the interquartile range.
  • Z-score method flags points beyond 3 standard deviations from the mean as outliers.
  • Adjust thresholds to tune sensitivity based on your dataset's characteristics.
Verified 2026-04
Verify ↗