How to beginner · 3 min read

How to detect outliers in pandas

Q: How to detect outliers in pandas

Use pandas with statistical methods like the interquartile range (IQR) or Z-score to detect outliers in your data. Calculate these metrics on your DataFrame columns and filter rows where values fall outside typical bounds.

Quick answer

Use pandas with statistical methods like the interquartile range (IQR) or Z-score to detect outliers in your data. Calculate these metrics on your DataFrame columns and filter rows where values fall outside typical bounds.

PREREQUISITES

Python 3.8+
pip install pandas>=1.5

Setup

Install pandas if not already installed and import it for data manipulation.

bash

pip install pandas

Step by step

This example shows how to detect outliers in a pandas DataFrame using the IQR method and Z-score method.

python

import pandas as pd
import numpy as np

# Sample data
np.random.seed(0)
data = pd.DataFrame({
    'value': np.append(np.random.normal(50, 5, 100), [100, 105, 110])
})

# IQR method
Q1 = data['value'].quantile(0.25)
Q3 = data['value'].quantile(0.75)
IQR = Q3 - Q1

outliers_iqr = data[(data['value'] < (Q1 - 1.5 * IQR)) | (data['value'] > (Q3 + 1.5 * IQR))]

print("Outliers detected by IQR method:")
print(outliers_iqr)

# Z-score method
mean_val = data['value'].mean()
std_val = data['value'].std()
z_scores = (data['value'] - mean_val) / std_val

outliers_z = data[(z_scores < -3) | (z_scores > 3)]

print("\nOutliers detected by Z-score method:")
print(outliers_z)

output

Outliers detected by IQR method:
    value
100    100.0
101    105.0
102    110.0

Outliers detected by Z-score method:
    value
100    100.0
101    105.0
102    110.0

Common variations

You can customize the threshold multiplier in the IQR method (default 1.5) or the Z-score cutoff (default 3) to be more or less sensitive. For multivariate data, consider using Mahalanobis distance or clustering methods.

python

def detect_outliers_iqr(df, column, multiplier=1.5):
    Q1 = df[column].quantile(0.25)
    Q3 = df[column].quantile(0.75)
    IQR = Q3 - Q1
    return df[(df[column] < (Q1 - multiplier * IQR)) | (df[column] > (Q3 + multiplier * IQR))]

outliers_custom = detect_outliers_iqr(data, 'value', multiplier=2)
print("Outliers with multiplier=2:")
print(outliers_custom)

output

Outliers with multiplier=2:
    value
102    110.0

Troubleshooting

If no outliers are detected, verify your data distribution and thresholds. Extremely skewed data may require transformations like log or Box-Cox before outlier detection.

✅

Key Takeaways

Use the IQR method to detect outliers by filtering values outside 1.5 times the interquartile range.
Z-score method flags points beyond 3 standard deviations from the mean as outliers.
Adjust thresholds to tune sensitivity based on your dataset's characteristics.

Verified 2026-04

Verify ↗